Curiosity · AI Model
Mixtral 8x22B
Mixtral 8x22B is Mistral AI's April 2024 flagship open-weights MoE — 141B total parameters with 39B active per token, released under Apache 2.0 as an unambiguously permissive production-grade open model. It defined the open MoE category before DeepSeek V3 and Llama 4 took it further.
Model specs
- Vendor
- Mistral AI
- Family
- Mixtral
- Released
- 2024-04
- Context window
- 65,536 tokens
- Modalities
- text
- Input price
- $1.2/M tok
- Output price
- $1.2/M tok
- Pricing as of
- 2026-04-20
Strengths
- Apache 2.0 — fully permissive open license for commercial use
- MoE architecture — 141B total but only 39B active per token
- Strong multilingual performance — native European language coverage
- Established in many inference stacks — vLLM, TensorRT-LLM, Together, Fireworks
Limitations
- Now trails DeepSeek V3, Llama 4, and Qwen 2.5 on most benchmarks
- 65K context — modest compared to 128K–1M windows in 2026
- MoE memory footprint is large even with efficient runtime
- Newer Mistral Large 2 and Mistral Large 3 are proprietary, not open
Use cases
- High-throughput production chat on self-hosted infra
- Multilingual workloads across French, German, Spanish, Italian
- Fine-tuning base for domain models needing MoE efficiency
- Coding pipelines where Apache 2.0 license matters
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| MMLU | ≈77% | 2024-04 |
| HumanEval | ≈76% | 2024-04 |
| MATH | ≈41% | 2024-04 |
Frequently asked questions
What is Mixtral 8x22B?
A Mixture-of-Experts open-weights LLM from Mistral AI with 8 experts of roughly 22B each, totaling 141B parameters but only activating ~39B per token. Released April 2024 under Apache 2.0.
Is Mixtral 8x22B still worth deploying?
For Apache 2.0 requirements and multilingual European workloads, yes. But for general quality-per-dollar, DeepSeek V3 or Llama 4 Maverick are now stronger open choices.
What's the difference between Mixtral 8x7B and 8x22B?
Both are Mistral MoE models, but 8x7B (46B total) targets smaller deployments while 8x22B (141B total) is the flagship. 8x22B is meaningfully stronger on reasoning, coding, and math.
Sources
- Mistral AI — Cheaper, Better, Faster, Stronger (Mixtral 8x22B) — accessed 2026-04-20
- Hugging Face — mistralai/Mixtral-8x22B-Instruct-v0.1 — accessed 2026-04-20