Curiosity · AI Model

Llama 4 Scout

Llama 4 Scout is Meta's 2025 efficiency-tier open-weights model in the Llama 4 family — a Mixture-of-Experts design that runs on a single H100 while still reaching long context. Pick Scout when you want Llama 4's native multimodality and MoE economics without the Maverick compute footprint.

Model specs

Vendor: Meta
Family: Llama 4
Released: 2025-04
Context window: 10,000,000 tokens
Modalities: text, vision
Input price: $0.1/M tok
Output price: $0.3/M tok
Pricing as of: 2026-04-20

Strengths

Open weights under the Llama 4 community license
Industry-leading 10M token context for document-heavy workloads
Single-GPU inference with 4-bit quantization
Native multimodal (text + vision) from pretraining

Limitations

Smaller than Maverick on reasoning and code benchmarks
Long-context claims degrade beyond ~1M tokens in practice
Behind Claude / GPT-5 on agentic multi-step tasks
Self-host ops burden — you own the GPUs and observability

Use cases

Single-GPU on-prem deployments and lab work
Long-document RAG pipelines with 10M token window
Fine-tunes on domain data where Maverick cost is prohibitive
Edge servers and sovereign-cloud inference

Benchmarks

Benchmark	Score	As of
MMLU-Pro	≈74%	2026-04
LiveCodeBench	≈32%	2026-04

Frequently asked questions

What is Llama 4 Scout?

Llama 4 Scout is the smaller of Meta's two Llama 4 open-weights models released April 2025 — a 17B active / 109B total Mixture-of-Experts LLM with a 10M token context window, released under the Llama 4 community license.

Can Llama 4 Scout run on one GPU?

Yes — Scout is designed to run on a single Nvidia H100 with 4-bit quantization, which makes it the practical on-prem choice versus Maverick's multi-GPU requirement.

When should I pick Scout over Maverick?

Pick Scout when single-GPU economics, long-context RAG, or sovereign deployment matter more than absolute reasoning quality. Pick Maverick when you need frontier-tier performance and have multi-GPU budget.

Sources

Meta — The Llama 4 Herd — accessed 2026-04-20
Hugging Face — meta-llama/Llama-4-Scout — accessed 2026-04-20