Curiosity · AI Model

Phi-4

Phi-4 is Microsoft Research's late-2024 open-weights small language model — a 14B dense transformer trained with a heavy synthetic-data curriculum that lifts reasoning quality well above its size class. Released under MIT license, it targets the 'small model, big reasoning' niche pioneered by the Phi series.

Model specs

Vendor: Microsoft
Family: Phi
Released: 2024-12
Context window: 16,000 tokens
Modalities: text
Input price: $0.07/M tok
Output price: $0.07/M tok
Pricing as of: 2026-04-20

Strengths

Open weights under MIT license — fully permissive commercial use
Punches above weight on reasoning — competitive with 70B class on math
Runs on a single 24GB consumer GPU in FP16
Transparent training recipe centered on high-quality synthetic data

Limitations

Only 16K context — short compared to 128K+ norms in 2026
Text-only — no vision or audio modalities
Narrow strength profile — weaker on creative writing and open-ended chat
Synthetic-data reliance raises concerns about memorization and generalization

Use cases

On-device assistants focused on math, logic, and coding
Edge deployments where 14B is the ceiling
Research on synthetic-data training curricula
Distillation target for larger teacher models

Benchmarks

Benchmark	Score	As of
MMLU	≈85%	2024-12
GSM8K	≈91%	2024-12
HumanEval	≈83%	2024-12

Frequently asked questions

What is Phi-4?

Phi-4 is Microsoft Research's 14B open-weights LLM released December 2024 under MIT license. It's the latest in the Phi 'small language model' series, trained with a heavy emphasis on synthetic reasoning and math data.

Why is Phi-4 so small yet good at reasoning?

The Phi training recipe leans hard on curated synthetic data emphasizing step-by-step reasoning. The hypothesis is that data quality and reasoning emphasis compensate for raw parameter count on logic benchmarks.

Should I use Phi-4 or Llama 3.1 8B?

Phi-4 wins on reasoning and math benchmarks for its size. Llama 3.1 8B wins on ecosystem, multilingual coverage, and 128K context. Choose Phi-4 for reasoning-centric work, Llama for breadth.

Sources

Microsoft Research — Phi-4 Technical Report — accessed 2026-04-20
Hugging Face — microsoft/phi-4 — accessed 2026-04-20