Curiosity · AI Model

Gemini 2.5 Flash

Gemini 2.5 Flash is Google's small-but-capable member of the Gemini 2.5 family, released in mid-2025. It inherits 2.5 Pro's thinking capability and native multimodality — text, vision, audio, video all in one model — while running fast enough and cheap enough to be the default Google model for production chat, RAG, and agent workloads.

Model specs

Vendor: Google
Family: Gemini 2.5
Released: 2025-06
Context window: 1,000,000 tokens
Modalities: text, vision, audio, video, code
Input price: $0.3/M tok
Output price: $2.5/M tok
Pricing as of: 2026-04-20

Strengths

Native multimodality covers text, images, audio, and video in one model
Tunable thinking budget lets devs trade cost for accuracy
1M-token context supports hour-long video or entire codebases
Strong price/performance — cheaper than GPT-5 mini and Haiku 4.5 on many metrics

Limitations

Trails Gemini 2.5 Pro and frontier models on the hardest reasoning tasks
Google's function-calling ergonomics lag OpenAI/Anthropic in developer experience
Quality can vary with modality — video understanding is strong, audio generation limited

Use cases

High-volume chat and RAG on Vertex AI or AI Studio
Long-document and long-video analysis
Audio transcription + summarisation pipelines
Agent workloads with Google Search or Vertex tool grounding

Benchmarks

Benchmark	Score	As of
MMLU	≈85%	2025-06
GPQA Diamond	≈72%	2025-06
Video-MME	≈74%	2025-06

Frequently asked questions

What is Gemini 2.5 Flash?

Gemini 2.5 Flash is the fast, cost-efficient tier of Google's Gemini 2.5 family. It is a thinking model with a 1M-token context window and native support for text, images, audio, and video — positioned as the default production model on Vertex AI and AI Studio.

What makes Gemini 2.5 Flash different from 2.0 Flash?

Gemini 2.5 Flash adds a thinking mode, better coding and reasoning scores, and stronger multimodal quality versus 2.0 Flash. It keeps the same fast, low-cost positioning but with noticeably higher quality at the top end.

How much does Gemini 2.5 Flash cost?

As of April 2026, Gemini 2.5 Flash is priced at roughly USD 0.30 per million input tokens and USD 2.50 per million output tokens, with discounted rates for cached context on Vertex AI.

Can Gemini 2.5 Flash process video?

Yes. Gemini 2.5 Flash natively processes video input, including frames and synchronized audio, and can summarise or answer questions across hour-long recordings thanks to the 1M-token context.

Sources

Google — Gemini 2.5 Flash — accessed 2026-04-20
Google Cloud — Gemini pricing — accessed 2026-04-20