Curiosity · AI Model

MPT-30B

MPT-30B is MosaicML's 2023 open-weight 30-billion-parameter transformer — a landmark 'commercial-licence' LLM released before Mosaic was acquired by Databricks. With 8k context and the company's FlashAttention+ALiBi training recipe, it was one of the first widely-used open models that organisations could deploy without the Llama community licence ambiguity.

Model specs

Vendor
MosaicML
Family
MPT
Released
2023-06
Context window
8,192 tokens
Modalities
text, code

Strengths

  • Apache-2.0 licensed, fully permissive
  • Pioneered several open training-efficiency techniques
  • Mosaic's training code is well documented for teaching

Limitations

  • Benchmarks far behind 2025-2026 open models
  • Short 8k context by modern standards
  • MosaicML was acquired by Databricks — project mostly superseded by DBRX

Use cases

  • Historical baselines in research papers
  • Organisations requiring strict Apache-2.0 models
  • Teaching transformer training recipes (FlashAttention, ALiBi)
  • Niche fine-tuning where old but stable weights are wanted

Benchmarks

BenchmarkScoreAs of
MMLU~47%2026-04
HumanEval~25%2026-04
HellaSwag~80%2026-04

Frequently asked questions

What is MPT-30B?

MPT-30B is MosaicML's 30-billion-parameter open-weight transformer language model, released in mid-2023 under Apache 2.0 with 8k context and innovations like FlashAttention and ALiBi positional embeddings.

Should I still use MPT-30B?

For new work, no — successors like DBRX Instruct and Llama 3 are far better. MPT-30B remains a useful historical reference and a strictly Apache-licensed baseline.

Sources

  1. MPT-30B on HuggingFace — accessed 2026-04-20
  2. MosaicML MPT announcement — accessed 2026-04-20