Curiosity · AI Model
InternVL 2.5
InternVL 2.5 is the third major release of OpenGVLab's open multimodal family. Released in December 2024, it ships in sizes from 1B to 78B parameters and is the first open VLM to exceed 70% on MMMU, matching GPT-4o's multimodal reasoning. Architecturally it pairs an InternViT-6B vision encoder with a Qwen-2.5 or LLaMA-based language backbone, and uses chain-of-thought rollouts at test time for long multimodal reasoning.
Model specs
- Vendor
- OpenGVLab (Shanghai AI Lab)
- Family
- InternVL
- Released
- 2024-12
- Context window
- 32,768 tokens
- Modalities
- text, vision, video
Strengths
- First open VLM above 70% on MMMU
- Broad size ladder — 1B to 78B — picks right quality/cost trade-off
- Strong math and OCR results
- Active release cadence with detailed technical reports
Limitations
- Large sizes need 8× A100-class deployment
- Chain-of-thought rollouts increase inference cost
- English creative writing lags western closed models
- Licence has some commercial restrictions at the top tier
Use cases
- Multimodal reasoning research
- Open fine-tuning backbone for vertical VLMs
- Visual document QA and math reasoning
- Model-card benchmarking and reproducibility
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| MMMU (val) | ≈70% (78B) | 2024-12 |
| MathVista | ≈72% | 2024-12 |
| OCRBench | ≈852 | 2024-12 |
Frequently asked questions
What is InternVL 2.5?
InternVL 2.5 is OpenGVLab's December 2024 open multimodal family, covering sizes from 1B to 78B and designed to match GPT-4o on multimodal reasoning benchmarks.
How does InternVL 2.5 compare to Qwen2-VL?
Both are leading open VLMs. InternVL 2.5 leads slightly on MMMU and MathVista thanks to test-time scaling; Qwen2-VL leads on long-video understanding.
Can I use InternVL 2.5 commercially?
Smaller variants are permissively licensed; the 78B model has extra restrictions. Check the Hugging Face model card for specifics.
Sources
- InternVL 2.5 paper (arXiv) — accessed 2026-04-20
- InternVL 2.5 on Hugging Face — accessed 2026-04-20