Curiosity · AI Model
Llama 4 Scout
Llama 4 Scout is Meta's 2025 efficiency-tier open-weights model in the Llama 4 family — a Mixture-of-Experts design that runs on a single H100 while still reaching long context. Pick Scout when you want Llama 4's native multimodality and MoE economics without the Maverick compute footprint.
Model specs
- Vendor
- Meta
- Family
- Llama 4
- Released
- 2025-04
- Context window
- 10,000,000 tokens
- Modalities
- text, vision
- Input price
- $0.1/M tok
- Output price
- $0.3/M tok
- Pricing as of
- 2026-04-20
Strengths
- Open weights under the Llama 4 community license
- Industry-leading 10M token context for document-heavy workloads
- Single-GPU inference with 4-bit quantization
- Native multimodal (text + vision) from pretraining
Limitations
- Smaller than Maverick on reasoning and code benchmarks
- Long-context claims degrade beyond ~1M tokens in practice
- Behind Claude / GPT-5 on agentic multi-step tasks
- Self-host ops burden — you own the GPUs and observability
Use cases
- Single-GPU on-prem deployments and lab work
- Long-document RAG pipelines with 10M token window
- Fine-tunes on domain data where Maverick cost is prohibitive
- Edge servers and sovereign-cloud inference
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| MMLU-Pro | ≈74% | 2026-04 |
| LiveCodeBench | ≈32% | 2026-04 |
Frequently asked questions
What is Llama 4 Scout?
Llama 4 Scout is the smaller of Meta's two Llama 4 open-weights models released April 2025 — a 17B active / 109B total Mixture-of-Experts LLM with a 10M token context window, released under the Llama 4 community license.
Can Llama 4 Scout run on one GPU?
Yes — Scout is designed to run on a single Nvidia H100 with 4-bit quantization, which makes it the practical on-prem choice versus Maverick's multi-GPU requirement.
When should I pick Scout over Maverick?
Pick Scout when single-GPU economics, long-context RAG, or sovereign deployment matter more than absolute reasoning quality. Pick Maverick when you need frontier-tier performance and have multi-GPU budget.
Sources
- Meta — The Llama 4 Herd — accessed 2026-04-20
- Hugging Face — meta-llama/Llama-4-Scout — accessed 2026-04-20