Curiosity · AI Model

DeepSeek V3

DeepSeek V3 is DeepSeek's December 2024 flagship open-weights MoE — 671B total parameters with 37B active per token, trained for a reported ~$5.6M in compute. It credibly matched GPT-4o on most benchmarks at release, shocking the market and triggering the wider 'DeepSeek moment' that revalued frontier training economics.

Model specs

Vendor
DeepSeek
Family
DeepSeek V3
Released
2024-12
Context window
128,000 tokens
Modalities
text
Input price
$0.27/M tok
Output price
$1.1/M tok
Pricing as of
2026-04-20

Strengths

  • Open weights — MIT-licensed for the model; full commercial use
  • GPT-4o class quality on reasoning, coding, and math benchmarks
  • Multi-head Latent Attention (MLA) — drastically cheaper inference
  • Massive community adoption — Together, Fireworks, SambaNova serve it

Limitations

  • 671B footprint — multi-node inference required for self-host
  • Some evaluation reports show weaker multilingual and safety behavior than Western frontier models
  • Export-control and geopolitical concerns for some enterprise buyers
  • Training data provenance less transparent than Llama or Gemma

Use cases

  • Frontier-quality self-hosted deployments
  • Synthetic data generation for distillation
  • Research baselines for alignment and efficiency work
  • Fine-tuning platforms where weights access is required

Benchmarks

BenchmarkScoreAs of
MMLU≈88%2024-12
HumanEval≈91%2024-12
MATH-500≈90%2024-12

Frequently asked questions

What is DeepSeek V3?

A 671B parameter Mixture-of-Experts open-weights LLM from Chinese AI lab DeepSeek, released December 2024. It activates 37B parameters per token and reached GPT-4o class benchmark performance at a reported training cost of ~$5.6M.

Why was DeepSeek V3 such a big deal?

Its combination of frontier quality, open weights, and dramatically lower training cost forced a reassessment of how much compute is actually required for SOTA models. It became the reference point for efficient frontier training.

Is DeepSeek V3 free to use commercially?

Yes — model weights are released under MIT license, which permits commercial self-hosting. You still pay compute costs, and hosted endpoints via the DeepSeek API or Together/Fireworks are usage-priced.

Sources

  1. DeepSeek — DeepSeek V3 Technical Report — accessed 2026-04-20
  2. Hugging Face — deepseek-ai/DeepSeek-V3 — accessed 2026-04-20