Curiosity · AI Model

Qwen2-VL 72B

Qwen2-VL 72B is Alibaba's flagship open vision-language model, released in September 2024 under the Qwen 2 family. It introduces a Naive Dynamic Resolution visual encoder and M-RoPE multimodal positional embeddings, letting it natively process arbitrary-resolution images and up to ~20-minute videos. The 72B instruct variant is competitive with GPT-4o on MMMU, DocVQA, and ChartQA while being downloadable from Hugging Face under the Qwen licence.

Model specs

Vendor
Alibaba / Qwen team
Family
Qwen 2
Released
2024-09
Context window
32,768 tokens
Modalities
text, vision, video

Strengths

  • Dynamic-resolution encoder avoids fixed-size image patches
  • M-RoPE gives strong temporal understanding for video
  • Competitive with GPT-4o on several benchmarks
  • Open weights for research and most commercial use

Limitations

  • 72B inference needs 2× A100 / H100-class GPUs at reasonable latency
  • Qwen licence has monthly-active-user restrictions at scale
  • Video understanding degrades past ~20 minutes
  • English creative writing behind GPT-4o

Use cases

  • Document understanding and multilingual OCR
  • Long-form video QA up to ~20 minutes
  • Chart and diagram reasoning
  • Open-weights replacement for closed VLM APIs

Benchmarks

BenchmarkScoreAs of
MMMU (val)≈64%2024-09
DocVQA≈96%2024-09
MathVista≈70%2024-09
Video-MME (w/ subtitles)≈72%2024-09

Frequently asked questions

What is Qwen2-VL 72B?

Qwen2-VL 72B is the flagship open vision-language model from Alibaba's Qwen team, released in September 2024. It natively handles arbitrary-resolution images and long videos.

How does Qwen2-VL compare to GPT-4o Vision?

On benchmarks like DocVQA, MathVista, and Video-MME, Qwen2-VL 72B is competitive with GPT-4o. It lags in some English creative tasks but offers open weights.

What licence does Qwen2-VL use?

The Qwen licence permits research and commercial use, with conditions for very large deployments (>100M MAU).

Sources

  1. Qwen2-VL blog — accessed 2026-04-20
  2. Qwen2-VL 72B on Hugging Face — accessed 2026-04-20