Curiosity · AI Model
DeepSeek V3
DeepSeek V3 is DeepSeek's December 2024 flagship open-weights MoE — 671B total parameters with 37B active per token, trained for a reported ~$5.6M in compute. It credibly matched GPT-4o on most benchmarks at release, shocking the market and triggering the wider 'DeepSeek moment' that revalued frontier training economics.
Model specs
- Vendor
- DeepSeek
- Family
- DeepSeek V3
- Released
- 2024-12
- Context window
- 128,000 tokens
- Modalities
- text
- Input price
- $0.27/M tok
- Output price
- $1.1/M tok
- Pricing as of
- 2026-04-20
Strengths
- Open weights — MIT-licensed for the model; full commercial use
- GPT-4o class quality on reasoning, coding, and math benchmarks
- Multi-head Latent Attention (MLA) — drastically cheaper inference
- Massive community adoption — Together, Fireworks, SambaNova serve it
Limitations
- 671B footprint — multi-node inference required for self-host
- Some evaluation reports show weaker multilingual and safety behavior than Western frontier models
- Export-control and geopolitical concerns for some enterprise buyers
- Training data provenance less transparent than Llama or Gemma
Use cases
- Frontier-quality self-hosted deployments
- Synthetic data generation for distillation
- Research baselines for alignment and efficiency work
- Fine-tuning platforms where weights access is required
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| MMLU | ≈88% | 2024-12 |
| HumanEval | ≈91% | 2024-12 |
| MATH-500 | ≈90% | 2024-12 |
Frequently asked questions
What is DeepSeek V3?
A 671B parameter Mixture-of-Experts open-weights LLM from Chinese AI lab DeepSeek, released December 2024. It activates 37B parameters per token and reached GPT-4o class benchmark performance at a reported training cost of ~$5.6M.
Why was DeepSeek V3 such a big deal?
Its combination of frontier quality, open weights, and dramatically lower training cost forced a reassessment of how much compute is actually required for SOTA models. It became the reference point for efficient frontier training.
Is DeepSeek V3 free to use commercially?
Yes — model weights are released under MIT license, which permits commercial self-hosting. You still pay compute costs, and hosted endpoints via the DeepSeek API or Together/Fireworks are usage-priced.
Sources
- DeepSeek — DeepSeek V3 Technical Report — accessed 2026-04-20
- Hugging Face — deepseek-ai/DeepSeek-V3 — accessed 2026-04-20