Curiosity · AI Model
MPT-30B
MPT-30B is MosaicML's 2023 open-weight 30-billion-parameter transformer — a landmark 'commercial-licence' LLM released before Mosaic was acquired by Databricks. With 8k context and the company's FlashAttention+ALiBi training recipe, it was one of the first widely-used open models that organisations could deploy without the Llama community licence ambiguity.
Model specs
- Vendor
- MosaicML
- Family
- MPT
- Released
- 2023-06
- Context window
- 8,192 tokens
- Modalities
- text, code
Strengths
- Apache-2.0 licensed, fully permissive
- Pioneered several open training-efficiency techniques
- Mosaic's training code is well documented for teaching
Limitations
- Benchmarks far behind 2025-2026 open models
- Short 8k context by modern standards
- MosaicML was acquired by Databricks — project mostly superseded by DBRX
Use cases
- Historical baselines in research papers
- Organisations requiring strict Apache-2.0 models
- Teaching transformer training recipes (FlashAttention, ALiBi)
- Niche fine-tuning where old but stable weights are wanted
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| MMLU | ~47% | 2026-04 |
| HumanEval | ~25% | 2026-04 |
| HellaSwag | ~80% | 2026-04 |
Frequently asked questions
What is MPT-30B?
MPT-30B is MosaicML's 30-billion-parameter open-weight transformer language model, released in mid-2023 under Apache 2.0 with 8k context and innovations like FlashAttention and ALiBi positional embeddings.
Should I still use MPT-30B?
For new work, no — successors like DBRX Instruct and Llama 3 are far better. MPT-30B remains a useful historical reference and a strictly Apache-licensed baseline.
Sources
- MPT-30B on HuggingFace — accessed 2026-04-20
- MosaicML MPT announcement — accessed 2026-04-20