Curiosity · AI Model
Qwen QwQ 32B
QwQ 32B is Alibaba Cloud's 2025 open-weights reasoning model — a 32B dense transformer post-trained with reinforcement learning to produce long chain-of-thought traces. Apache 2.0 licensed, it competes with much larger reasoning models like DeepSeek R1 and OpenAI o1 while running on a single H100.
Model specs
- Vendor
- Alibaba
- Family
- Qwen QwQ
- Released
- 2025-03
- Context window
- 131,072 tokens
- Modalities
- text
- Input price
- $0.15/M tok
- Output price
- $0.6/M tok
- Pricing as of
- 2026-04-20
Strengths
- Open weights under Apache 2.0 — fully permissive commercial use
- 32B size runs on a single H100 — unusual for frontier reasoning quality
- Competitive with DeepSeek R1 671B on math and code benchmarks
- Transparent RL training recipe published by the Qwen team
Limitations
- Reasoning traces inflate output tokens — cost-per-answer is higher
- Weaker than QwQ's larger siblings on general chat and creative tasks
- Trails full DeepSeek R1 on the hardest reasoning evaluations
- Reasoning loop can produce repetitive or degenerate traces on edge cases
Use cases
- Self-hosted reasoning assistants for math, logic, science
- Low-cost alternative to DeepSeek R1 for single-GPU deployment
- Research on RL-based reasoning post-training
- Embedded reasoning in agent frameworks where size matters
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| AIME 2024 | ≈79% | 2025-03 |
| MATH-500 | ≈91% | 2025-03 |
| LiveCodeBench | ≈63% | 2025-03 |
Frequently asked questions
What is QwQ 32B?
QwQ 32B is Alibaba Cloud's 32B open-weights reasoning LLM trained with reinforcement learning to produce long chain-of-thought. Apache 2.0 licensed, released March 2025, competitive with OpenAI o1 at a single-GPU footprint.
Is QwQ 32B better than DeepSeek R1?
On some benchmarks QwQ 32B rivals full DeepSeek R1 while being ~20x smaller. DeepSeek R1 has the edge on the hardest problems and has more thoroughly characterized behavior, but QwQ is the cheaper self-host.
Can QwQ 32B run on consumer hardware?
Quantized to 4-bit, QwQ 32B fits on a 24GB consumer GPU (RTX 3090/4090). Unquantized FP16 inference needs a 48GB card or a single H100/H200.
Sources
- Qwen — QwQ 32B blog — accessed 2026-04-20
- Hugging Face — Qwen/QwQ-32B — accessed 2026-04-20