Curiosity · AI Model
Llama 4 Maverick
Llama 4 Maverick is Meta's 2025 open-weights flagship in the Llama 4 family — a mixture-of-experts model that trades a small quality gap versus closed frontier for full weights access, predictable self-host economics, and no per-token vendor lock-in. The default choice when sovereignty or on-prem inference matters.
Model specs
- Vendor
- Meta
- Family
- Llama 4
- Released
- 2025-04
- Context window
- 1,000,000 tokens
- Modalities
- text, vision
- Input price
- $0.2/M tok
- Output price
- $0.6/M tok
- Pricing as of
- 2026-04-20
Strengths
- Open weights — full fine-tuning and inference control
- Competitive with closed mid-tier on general reasoning
- Mixture-of-Experts makes per-token inference economical
- Mature ecosystem — vLLM, TensorRT-LLM, llama.cpp, Ollama
Limitations
- Behind Claude Opus 4.7 / GPT-5 on frontier agentic tasks
- Vision is good but not SOTA vs. Gemini 2.5 Pro
- Self-host ops cost — GPUs, networking, observability
Use cases
- Self-hosted assistants with full data sovereignty
- Custom fine-tunes — legal, medical, regional language
- Batch inference pipelines where closed API cost is prohibitive
- Research groups needing weights for interpretability work
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| MMLU-Pro | ≈81% | 2026-04 |
| HumanEval | ≈88% | 2026-04 |
Frequently asked questions
What is Llama 4 Maverick?
Llama 4 Maverick is Meta's 2025 flagship open-weights large language model — a Mixture-of-Experts system released under the Llama 4 community license, usable on-prem or via hosted providers like Together, Fireworks, or Groq.
Is Llama 4 Maverick free?
Weights are downloadable and usable under the Llama 4 community license (with revenue-threshold restrictions). Running inference is not free — you pay for compute if self-hosting, or per-token on hosted providers.
Why pick Llama 4 over Claude or GPT-5?
Open weights give you data sovereignty, fine-tuning freedom, and predictable self-host economics. Pick Llama when you need to run inside your own infra, fine-tune on proprietary data, or avoid per-token vendor dependency.
Sources
- Meta — Llama models — accessed 2026-04-20
- Hugging Face — Meta Llama collection — accessed 2026-04-20