Curiosity · AI Model

Phi-4

Phi-4 is Microsoft Research's late-2024 open-weights small language model — a 14B dense transformer trained with a heavy synthetic-data curriculum that lifts reasoning quality well above its size class. Released under MIT license, it targets the 'small model, big reasoning' niche pioneered by the Phi series.

Model specs

Vendor
Microsoft
Family
Phi
Released
2024-12
Context window
16,000 tokens
Modalities
text
Input price
$0.07/M tok
Output price
$0.07/M tok
Pricing as of
2026-04-20

Strengths

  • Open weights under MIT license — fully permissive commercial use
  • Punches above weight on reasoning — competitive with 70B class on math
  • Runs on a single 24GB consumer GPU in FP16
  • Transparent training recipe centered on high-quality synthetic data

Limitations

  • Only 16K context — short compared to 128K+ norms in 2026
  • Text-only — no vision or audio modalities
  • Narrow strength profile — weaker on creative writing and open-ended chat
  • Synthetic-data reliance raises concerns about memorization and generalization

Use cases

  • On-device assistants focused on math, logic, and coding
  • Edge deployments where 14B is the ceiling
  • Research on synthetic-data training curricula
  • Distillation target for larger teacher models

Benchmarks

BenchmarkScoreAs of
MMLU≈85%2024-12
GSM8K≈91%2024-12
HumanEval≈83%2024-12

Frequently asked questions

What is Phi-4?

Phi-4 is Microsoft Research's 14B open-weights LLM released December 2024 under MIT license. It's the latest in the Phi 'small language model' series, trained with a heavy emphasis on synthetic reasoning and math data.

Why is Phi-4 so small yet good at reasoning?

The Phi training recipe leans hard on curated synthetic data emphasizing step-by-step reasoning. The hypothesis is that data quality and reasoning emphasis compensate for raw parameter count on logic benchmarks.

Should I use Phi-4 or Llama 3.1 8B?

Phi-4 wins on reasoning and math benchmarks for its size. Llama 3.1 8B wins on ecosystem, multilingual coverage, and 128K context. Choose Phi-4 for reasoning-centric work, Llama for breadth.

Sources

  1. Microsoft Research — Phi-4 Technical Report — accessed 2026-04-20
  2. Hugging Face — microsoft/phi-4 — accessed 2026-04-20