Curiosity · AI Model

Mixtral 8x22B

Mixtral 8x22B is Mistral AI's April 2024 flagship open-weights MoE — 141B total parameters with 39B active per token, released under Apache 2.0 as an unambiguously permissive production-grade open model. It defined the open MoE category before DeepSeek V3 and Llama 4 took it further.

Model specs

Vendor
Mistral AI
Family
Mixtral
Released
2024-04
Context window
65,536 tokens
Modalities
text
Input price
$1.2/M tok
Output price
$1.2/M tok
Pricing as of
2026-04-20

Strengths

  • Apache 2.0 — fully permissive open license for commercial use
  • MoE architecture — 141B total but only 39B active per token
  • Strong multilingual performance — native European language coverage
  • Established in many inference stacks — vLLM, TensorRT-LLM, Together, Fireworks

Limitations

  • Now trails DeepSeek V3, Llama 4, and Qwen 2.5 on most benchmarks
  • 65K context — modest compared to 128K–1M windows in 2026
  • MoE memory footprint is large even with efficient runtime
  • Newer Mistral Large 2 and Mistral Large 3 are proprietary, not open

Use cases

  • High-throughput production chat on self-hosted infra
  • Multilingual workloads across French, German, Spanish, Italian
  • Fine-tuning base for domain models needing MoE efficiency
  • Coding pipelines where Apache 2.0 license matters

Benchmarks

BenchmarkScoreAs of
MMLU≈77%2024-04
HumanEval≈76%2024-04
MATH≈41%2024-04

Frequently asked questions

What is Mixtral 8x22B?

A Mixture-of-Experts open-weights LLM from Mistral AI with 8 experts of roughly 22B each, totaling 141B parameters but only activating ~39B per token. Released April 2024 under Apache 2.0.

Is Mixtral 8x22B still worth deploying?

For Apache 2.0 requirements and multilingual European workloads, yes. But for general quality-per-dollar, DeepSeek V3 or Llama 4 Maverick are now stronger open choices.

What's the difference between Mixtral 8x7B and 8x22B?

Both are Mistral MoE models, but 8x7B (46B total) targets smaller deployments while 8x22B (141B total) is the flagship. 8x22B is meaningfully stronger on reasoning, coding, and math.

Sources

  1. Mistral AI — Cheaper, Better, Faster, Stronger (Mixtral 8x22B) — accessed 2026-04-20
  2. Hugging Face — mistralai/Mixtral-8x22B-Instruct-v0.1 — accessed 2026-04-20