Curiosity · AI Model

Llama 3.1 405B Instruct

Llama 3.1 405B Instruct is Meta's July 2024 flagship open-weights model — a 405B dense transformer that was, at release, the largest openly available LLM and the first open model to reach GPT-4-class quality on major benchmarks. It remains a landmark release for the open ecosystem and a common baseline for research.

Model specs

Vendor
Meta
Family
Llama 3
Released
2024-07
Context window
128,000 tokens
Modalities
text
Input price
$3.5/M tok
Output price
$3.5/M tok
Pricing as of
2026-04-20

Strengths

  • Open weights — the largest credible open release at 2024 launch
  • GPT-4-class quality on reasoning, math, and coding benchmarks
  • Excellent synthetic-data source for distilling 8B and 70B targets
  • 128K context window with strong retrieval performance

Limitations

  • 405B dense — requires 8x H100 or equivalent for BF16 inference
  • Higher inference cost per token than Llama 3.3 70B with small quality delta
  • Now surpassed by Llama 4 Maverick and DeepSeek V3 on most tasks
  • Text-only — no vision or audio

Use cases

  • Research baselines for alignment and interpretability
  • Synthetic data generation for distilling smaller models
  • Batch inference where absolute quality justifies cost
  • Sovereign deployments requiring frontier-tier open models

Benchmarks

BenchmarkScoreAs of
MMLU≈88%2024-07
HumanEval≈89%2024-07
MATH≈73%2024-07

Frequently asked questions

Why was Llama 3.1 405B a big deal?

It was the first open-weights LLM to credibly match GPT-4 class closed models on major public benchmarks. That broke the assumption that frontier quality required proprietary weights and kicked off the open-vs-closed convergence.

Should I deploy 405B or 70B?

Deploy 70B (Llama 3.3) in 2026 for almost every production use case — the post-training upgrade closed the gap. Use 405B only when you need the last percentage points of quality or as a teacher for distillation.

What hardware does 405B need?

In full BF16 you need roughly 8x H100 80GB. With FP8 quantization you can fit on 4x H100. Hosted providers like Together, Fireworks, and Groq serve it per-token.

Sources

  1. Meta — Introducing Llama 3.1 — accessed 2026-04-20
  2. Hugging Face — meta-llama/Llama-3.1-405B-Instruct — accessed 2026-04-20