Curiosity · AI Model

Llama 4 Maverick

Llama 4 Maverick is Meta's 2025 open-weights flagship in the Llama 4 family — a mixture-of-experts model that trades a small quality gap versus closed frontier for full weights access, predictable self-host economics, and no per-token vendor lock-in. The default choice when sovereignty or on-prem inference matters.

Model specs

Vendor
Meta
Family
Llama 4
Released
2025-04
Context window
1,000,000 tokens
Modalities
text, vision
Input price
$0.2/M tok
Output price
$0.6/M tok
Pricing as of
2026-04-20

Strengths

  • Open weights — full fine-tuning and inference control
  • Competitive with closed mid-tier on general reasoning
  • Mixture-of-Experts makes per-token inference economical
  • Mature ecosystem — vLLM, TensorRT-LLM, llama.cpp, Ollama

Limitations

  • Behind Claude Opus 4.7 / GPT-5 on frontier agentic tasks
  • Vision is good but not SOTA vs. Gemini 2.5 Pro
  • Self-host ops cost — GPUs, networking, observability

Use cases

  • Self-hosted assistants with full data sovereignty
  • Custom fine-tunes — legal, medical, regional language
  • Batch inference pipelines where closed API cost is prohibitive
  • Research groups needing weights for interpretability work

Benchmarks

BenchmarkScoreAs of
MMLU-Pro≈81%2026-04
HumanEval≈88%2026-04

Frequently asked questions

What is Llama 4 Maverick?

Llama 4 Maverick is Meta's 2025 flagship open-weights large language model — a Mixture-of-Experts system released under the Llama 4 community license, usable on-prem or via hosted providers like Together, Fireworks, or Groq.

Is Llama 4 Maverick free?

Weights are downloadable and usable under the Llama 4 community license (with revenue-threshold restrictions). Running inference is not free — you pay for compute if self-hosting, or per-token on hosted providers.

Why pick Llama 4 over Claude or GPT-5?

Open weights give you data sovereignty, fine-tuning freedom, and predictable self-host economics. Pick Llama when you need to run inside your own infra, fine-tune on proprietary data, or avoid per-token vendor dependency.

Sources

  1. Meta — Llama models — accessed 2026-04-20
  2. Hugging Face — Meta Llama collection — accessed 2026-04-20