Curiosity · AI Model

Qwen 2.5 3B

Qwen 2.5 3B is one of the smallest members of Alibaba's Qwen 2.5 open LLM family, released September 2024. The 3-billion-parameter model ships with a 128k context window (via YaRN extension), tool-use-aware instruction tuning, and multilingual coverage across 29 languages. It significantly outperforms Qwen 2 at the same scale on reasoning, coding, and math, thanks to Qwen's redesigned 18T-token training corpus. Released under Apache 2.0, it has quickly become a common choice for edge agents and fine-tuning baselines.

Model specs

Vendor
Alibaba / Qwen team
Family
Qwen 2.5
Released
2024-09
Context window
128,000 tokens
Modalities
text

Strengths

  • Strong quality-per-parameter, especially on code and math
  • Apache 2.0 licence — fully open commercial use
  • Tool-use-aware instruction tuning
  • 128k context via YaRN extension

Limitations

  • Trails 7B+ models on deep reasoning
  • No built-in vision (see Qwen2-VL for multimodal)
  • Chain-of-thought depth limited vs larger Qwen 2.5 variants
  • Smaller safety-tuning dataset than 72B

Use cases

  • Edge and on-device agents
  • Fine-tuning baseline at 3B scale
  • Multilingual chat in 29 languages
  • RAG over long documents at 128k

Benchmarks

BenchmarkScoreAs of
MMLU (5-shot)≈66%2024-09
HumanEval≈70%2024-09
MATH≈43%2024-09

Frequently asked questions

What is Qwen 2.5 3B?

A 3B-parameter open LLM from Alibaba's Qwen 2.5 family, released September 2024 under Apache 2.0 with 128k context and multilingual coverage.

How does it compare to Phi-3-mini?

They are similar-size competitors. Qwen 2.5 3B is stronger on multilingual and coding, while Phi-3-mini sometimes leads on English reasoning per token.

Is Qwen 2.5 3B multimodal?

No — it is text-only. For vision, use Qwen2-VL.

Sources

  1. Qwen 2.5 blog — accessed 2026-04-20
  2. Qwen 2.5 3B on Hugging Face — accessed 2026-04-20