Curiosity · AI Model
Qwen2-VL 72B
Qwen2-VL 72B is Alibaba's flagship open vision-language model, released in September 2024 under the Qwen 2 family. It introduces a Naive Dynamic Resolution visual encoder and M-RoPE multimodal positional embeddings, letting it natively process arbitrary-resolution images and up to ~20-minute videos. The 72B instruct variant is competitive with GPT-4o on MMMU, DocVQA, and ChartQA while being downloadable from Hugging Face under the Qwen licence.
Model specs
- Vendor
- Alibaba / Qwen team
- Family
- Qwen 2
- Released
- 2024-09
- Context window
- 32,768 tokens
- Modalities
- text, vision, video
Strengths
- Dynamic-resolution encoder avoids fixed-size image patches
- M-RoPE gives strong temporal understanding for video
- Competitive with GPT-4o on several benchmarks
- Open weights for research and most commercial use
Limitations
- 72B inference needs 2× A100 / H100-class GPUs at reasonable latency
- Qwen licence has monthly-active-user restrictions at scale
- Video understanding degrades past ~20 minutes
- English creative writing behind GPT-4o
Use cases
- Document understanding and multilingual OCR
- Long-form video QA up to ~20 minutes
- Chart and diagram reasoning
- Open-weights replacement for closed VLM APIs
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| MMMU (val) | ≈64% | 2024-09 |
| DocVQA | ≈96% | 2024-09 |
| MathVista | ≈70% | 2024-09 |
| Video-MME (w/ subtitles) | ≈72% | 2024-09 |
Frequently asked questions
What is Qwen2-VL 72B?
Qwen2-VL 72B is the flagship open vision-language model from Alibaba's Qwen team, released in September 2024. It natively handles arbitrary-resolution images and long videos.
How does Qwen2-VL compare to GPT-4o Vision?
On benchmarks like DocVQA, MathVista, and Video-MME, Qwen2-VL 72B is competitive with GPT-4o. It lags in some English creative tasks but offers open weights.
What licence does Qwen2-VL use?
The Qwen licence permits research and commercial use, with conditions for very large deployments (>100M MAU).
Sources
- Qwen2-VL blog — accessed 2026-04-20
- Qwen2-VL 72B on Hugging Face — accessed 2026-04-20