Curiosity · AI Model

Llama 4 Maverick

Llama 4 Maverick is Meta's 2025 open-weights flagship in the Llama 4 family — a mixture-of-experts model that trades a small quality gap versus closed frontier for full weights access, predictable self-host economics, and no per-token vendor lock-in. The default choice when sovereignty or on-prem inference matters.

Model specs

Vendor: Meta
Family: Llama 4
Released: 2025-04
Context window: 1,000,000 tokens
Modalities: text, vision
Input price: $0.2/M tok
Output price: $0.6/M tok
Pricing as of: 2026-04-20

Strengths

Open weights — full fine-tuning and inference control
Competitive with closed mid-tier on general reasoning
Mixture-of-Experts makes per-token inference economical
Mature ecosystem — vLLM, TensorRT-LLM, llama.cpp, Ollama

Limitations

Behind Claude Opus 4.7 / GPT-5 on frontier agentic tasks
Vision is good but not SOTA vs. Gemini 2.5 Pro
Self-host ops cost — GPUs, networking, observability

Use cases

Self-hosted assistants with full data sovereignty
Custom fine-tunes — legal, medical, regional language
Batch inference pipelines where closed API cost is prohibitive
Research groups needing weights for interpretability work

Benchmarks

Benchmark	Score	As of
MMLU-Pro	≈81%	2026-04
HumanEval	≈88%	2026-04

Frequently asked questions

What is Llama 4 Maverick?

Llama 4 Maverick is Meta's 2025 flagship open-weights large language model — a Mixture-of-Experts system released under the Llama 4 community license, usable on-prem or via hosted providers like Together, Fireworks, or Groq.

Is Llama 4 Maverick free?

Weights are downloadable and usable under the Llama 4 community license (with revenue-threshold restrictions). Running inference is not free — you pay for compute if self-hosting, or per-token on hosted providers.

Why pick Llama 4 over Claude or GPT-5?

Open weights give you data sovereignty, fine-tuning freedom, and predictable self-host economics. Pick Llama when you need to run inside your own infra, fine-tune on proprietary data, or avoid per-token vendor dependency.

Sources

Meta — Llama models — accessed 2026-04-20
Hugging Face — Meta Llama collection — accessed 2026-04-20