Curiosity · AI Model

Qwen QwQ 32B

QwQ 32B is Alibaba Cloud's 2025 open-weights reasoning model — a 32B dense transformer post-trained with reinforcement learning to produce long chain-of-thought traces. Apache 2.0 licensed, it competes with much larger reasoning models like DeepSeek R1 and OpenAI o1 while running on a single H100.

Model specs

Vendor: Alibaba
Family: Qwen QwQ
Released: 2025-03
Context window: 131,072 tokens
Modalities: text
Input price: $0.15/M tok
Output price: $0.6/M tok
Pricing as of: 2026-04-20

Strengths

Open weights under Apache 2.0 — fully permissive commercial use
32B size runs on a single H100 — unusual for frontier reasoning quality
Competitive with DeepSeek R1 671B on math and code benchmarks
Transparent RL training recipe published by the Qwen team

Limitations

Reasoning traces inflate output tokens — cost-per-answer is higher
Weaker than QwQ's larger siblings on general chat and creative tasks
Trails full DeepSeek R1 on the hardest reasoning evaluations
Reasoning loop can produce repetitive or degenerate traces on edge cases

Use cases

Self-hosted reasoning assistants for math, logic, science
Low-cost alternative to DeepSeek R1 for single-GPU deployment
Research on RL-based reasoning post-training
Embedded reasoning in agent frameworks where size matters

Benchmarks

Benchmark	Score	As of
AIME 2024	≈79%	2025-03
MATH-500	≈91%	2025-03
LiveCodeBench	≈63%	2025-03

Frequently asked questions

What is QwQ 32B?

QwQ 32B is Alibaba Cloud's 32B open-weights reasoning LLM trained with reinforcement learning to produce long chain-of-thought. Apache 2.0 licensed, released March 2025, competitive with OpenAI o1 at a single-GPU footprint.

Is QwQ 32B better than DeepSeek R1?

On some benchmarks QwQ 32B rivals full DeepSeek R1 while being ~20x smaller. DeepSeek R1 has the edge on the hardest problems and has more thoroughly characterized behavior, but QwQ is the cheaper self-host.

Can QwQ 32B run on consumer hardware?

Quantized to 4-bit, QwQ 32B fits on a 24GB consumer GPU (RTX 3090/4090). Unquantized FP16 inference needs a 48GB card or a single H100/H200.

Sources

Qwen — QwQ 32B blog — accessed 2026-04-20
Hugging Face — Qwen/QwQ-32B — accessed 2026-04-20