Curiosity · AI Model
Phi-4
Phi-4 is Microsoft Research's late-2024 open-weights small language model — a 14B dense transformer trained with a heavy synthetic-data curriculum that lifts reasoning quality well above its size class. Released under MIT license, it targets the 'small model, big reasoning' niche pioneered by the Phi series.
Model specs
- Vendor
- Microsoft
- Family
- Phi
- Released
- 2024-12
- Context window
- 16,000 tokens
- Modalities
- text
- Input price
- $0.07/M tok
- Output price
- $0.07/M tok
- Pricing as of
- 2026-04-20
Strengths
- Open weights under MIT license — fully permissive commercial use
- Punches above weight on reasoning — competitive with 70B class on math
- Runs on a single 24GB consumer GPU in FP16
- Transparent training recipe centered on high-quality synthetic data
Limitations
- Only 16K context — short compared to 128K+ norms in 2026
- Text-only — no vision or audio modalities
- Narrow strength profile — weaker on creative writing and open-ended chat
- Synthetic-data reliance raises concerns about memorization and generalization
Use cases
- On-device assistants focused on math, logic, and coding
- Edge deployments where 14B is the ceiling
- Research on synthetic-data training curricula
- Distillation target for larger teacher models
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| MMLU | ≈85% | 2024-12 |
| GSM8K | ≈91% | 2024-12 |
| HumanEval | ≈83% | 2024-12 |
Frequently asked questions
What is Phi-4?
Phi-4 is Microsoft Research's 14B open-weights LLM released December 2024 under MIT license. It's the latest in the Phi 'small language model' series, trained with a heavy emphasis on synthetic reasoning and math data.
Why is Phi-4 so small yet good at reasoning?
The Phi training recipe leans hard on curated synthetic data emphasizing step-by-step reasoning. The hypothesis is that data quality and reasoning emphasis compensate for raw parameter count on logic benchmarks.
Should I use Phi-4 or Llama 3.1 8B?
Phi-4 wins on reasoning and math benchmarks for its size. Llama 3.1 8B wins on ecosystem, multilingual coverage, and 128K context. Choose Phi-4 for reasoning-centric work, Llama for breadth.
Sources
- Microsoft Research — Phi-4 Technical Report — accessed 2026-04-20
- Hugging Face — microsoft/phi-4 — accessed 2026-04-20