Curiosity · AI Model

Gemini 2.5 Flash

Gemini 2.5 Flash is Google's small-but-capable member of the Gemini 2.5 family, released in mid-2025. It inherits 2.5 Pro's thinking capability and native multimodality — text, vision, audio, video all in one model — while running fast enough and cheap enough to be the default Google model for production chat, RAG, and agent workloads.

Model specs

Vendor
Google
Family
Gemini 2.5
Released
2025-06
Context window
1,000,000 tokens
Modalities
text, vision, audio, video, code
Input price
$0.3/M tok
Output price
$2.5/M tok
Pricing as of
2026-04-20

Strengths

  • Native multimodality covers text, images, audio, and video in one model
  • Tunable thinking budget lets devs trade cost for accuracy
  • 1M-token context supports hour-long video or entire codebases
  • Strong price/performance — cheaper than GPT-5 mini and Haiku 4.5 on many metrics

Limitations

  • Trails Gemini 2.5 Pro and frontier models on the hardest reasoning tasks
  • Google's function-calling ergonomics lag OpenAI/Anthropic in developer experience
  • Quality can vary with modality — video understanding is strong, audio generation limited

Use cases

  • High-volume chat and RAG on Vertex AI or AI Studio
  • Long-document and long-video analysis
  • Audio transcription + summarisation pipelines
  • Agent workloads with Google Search or Vertex tool grounding

Benchmarks

BenchmarkScoreAs of
MMLU≈85%2025-06
GPQA Diamond≈72%2025-06
Video-MME≈74%2025-06

Frequently asked questions

What is Gemini 2.5 Flash?

Gemini 2.5 Flash is the fast, cost-efficient tier of Google's Gemini 2.5 family. It is a thinking model with a 1M-token context window and native support for text, images, audio, and video — positioned as the default production model on Vertex AI and AI Studio.

What makes Gemini 2.5 Flash different from 2.0 Flash?

Gemini 2.5 Flash adds a thinking mode, better coding and reasoning scores, and stronger multimodal quality versus 2.0 Flash. It keeps the same fast, low-cost positioning but with noticeably higher quality at the top end.

How much does Gemini 2.5 Flash cost?

As of April 2026, Gemini 2.5 Flash is priced at roughly USD 0.30 per million input tokens and USD 2.50 per million output tokens, with discounted rates for cached context on Vertex AI.

Can Gemini 2.5 Flash process video?

Yes. Gemini 2.5 Flash natively processes video input, including frames and synchronized audio, and can summarise or answer questions across hour-long recordings thanks to the 1M-token context.

Sources

  1. Google — Gemini 2.5 Flash — accessed 2026-04-20
  2. Google Cloud — Gemini pricing — accessed 2026-04-20