Curiosity · AI Model
Qwen 2.5 3B
Qwen 2.5 3B is one of the smallest members of Alibaba's Qwen 2.5 open LLM family, released September 2024. The 3-billion-parameter model ships with a 128k context window (via YaRN extension), tool-use-aware instruction tuning, and multilingual coverage across 29 languages. It significantly outperforms Qwen 2 at the same scale on reasoning, coding, and math, thanks to Qwen's redesigned 18T-token training corpus. Released under Apache 2.0, it has quickly become a common choice for edge agents and fine-tuning baselines.
Model specs
- Vendor
- Alibaba / Qwen team
- Family
- Qwen 2.5
- Released
- 2024-09
- Context window
- 128,000 tokens
- Modalities
- text
Strengths
- Strong quality-per-parameter, especially on code and math
- Apache 2.0 licence — fully open commercial use
- Tool-use-aware instruction tuning
- 128k context via YaRN extension
Limitations
- Trails 7B+ models on deep reasoning
- No built-in vision (see Qwen2-VL for multimodal)
- Chain-of-thought depth limited vs larger Qwen 2.5 variants
- Smaller safety-tuning dataset than 72B
Use cases
- Edge and on-device agents
- Fine-tuning baseline at 3B scale
- Multilingual chat in 29 languages
- RAG over long documents at 128k
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| MMLU (5-shot) | ≈66% | 2024-09 |
| HumanEval | ≈70% | 2024-09 |
| MATH | ≈43% | 2024-09 |
Frequently asked questions
What is Qwen 2.5 3B?
A 3B-parameter open LLM from Alibaba's Qwen 2.5 family, released September 2024 under Apache 2.0 with 128k context and multilingual coverage.
How does it compare to Phi-3-mini?
They are similar-size competitors. Qwen 2.5 3B is stronger on multilingual and coding, while Phi-3-mini sometimes leads on English reasoning per token.
Is Qwen 2.5 3B multimodal?
No — it is text-only. For vision, use Qwen2-VL.
Sources
- Qwen 2.5 blog — accessed 2026-04-20
- Qwen 2.5 3B on Hugging Face — accessed 2026-04-20