Curiosity · AI Model

Qwen 2.5 3B

Qwen 2.5 3B is one of the smallest members of Alibaba's Qwen 2.5 open LLM family, released September 2024. The 3-billion-parameter model ships with a 128k context window (via YaRN extension), tool-use-aware instruction tuning, and multilingual coverage across 29 languages. It significantly outperforms Qwen 2 at the same scale on reasoning, coding, and math, thanks to Qwen's redesigned 18T-token training corpus. Released under Apache 2.0, it has quickly become a common choice for edge agents and fine-tuning baselines.

Model specs

Vendor: Alibaba / Qwen team
Family: Qwen 2.5
Released: 2024-09
Context window: 128,000 tokens
Modalities: text

Strengths

Strong quality-per-parameter, especially on code and math
Apache 2.0 licence — fully open commercial use
Tool-use-aware instruction tuning
128k context via YaRN extension

Limitations

Trails 7B+ models on deep reasoning
No built-in vision (see Qwen2-VL for multimodal)
Chain-of-thought depth limited vs larger Qwen 2.5 variants
Smaller safety-tuning dataset than 72B

Use cases

Edge and on-device agents
Fine-tuning baseline at 3B scale
Multilingual chat in 29 languages
RAG over long documents at 128k

Benchmarks

Benchmark	Score	As of
MMLU (5-shot)	≈66%	2024-09
HumanEval	≈70%	2024-09
MATH	≈43%	2024-09

Frequently asked questions

What is Qwen 2.5 3B?

A 3B-parameter open LLM from Alibaba's Qwen 2.5 family, released September 2024 under Apache 2.0 with 128k context and multilingual coverage.

How does it compare to Phi-3-mini?

They are similar-size competitors. Qwen 2.5 3B is stronger on multilingual and coding, while Phi-3-mini sometimes leads on English reasoning per token.

Is Qwen 2.5 3B multimodal?

No — it is text-only. For vision, use Qwen2-VL.

Sources

Qwen 2.5 blog — accessed 2026-04-20
Qwen 2.5 3B on Hugging Face — accessed 2026-04-20