Curiosity · AI Model

MPT-30B

MPT-30B is MosaicML's 2023 open-weight 30-billion-parameter transformer — a landmark 'commercial-licence' LLM released before Mosaic was acquired by Databricks. With 8k context and the company's FlashAttention+ALiBi training recipe, it was one of the first widely-used open models that organisations could deploy without the Llama community licence ambiguity.

Model specs

Vendor: MosaicML
Family: MPT
Released: 2023-06
Context window: 8,192 tokens
Modalities: text, code

Strengths

Apache-2.0 licensed, fully permissive
Pioneered several open training-efficiency techniques
Mosaic's training code is well documented for teaching

Limitations

Benchmarks far behind 2025-2026 open models
Short 8k context by modern standards
MosaicML was acquired by Databricks — project mostly superseded by DBRX

Use cases

Historical baselines in research papers
Organisations requiring strict Apache-2.0 models
Teaching transformer training recipes (FlashAttention, ALiBi)
Niche fine-tuning where old but stable weights are wanted

Benchmarks

Benchmark	Score	As of
MMLU	~47%	2026-04
HumanEval	~25%	2026-04
HellaSwag	~80%	2026-04

Frequently asked questions

What is MPT-30B?

MPT-30B is MosaicML's 30-billion-parameter open-weight transformer language model, released in mid-2023 under Apache 2.0 with 8k context and innovations like FlashAttention and ALiBi positional embeddings.

Should I still use MPT-30B?

For new work, no — successors like DBRX Instruct and Llama 3 are far better. MPT-30B remains a useful historical reference and a strictly Apache-licensed baseline.

Sources

MPT-30B on HuggingFace — accessed 2026-04-20
MosaicML MPT announcement — accessed 2026-04-20