Curiosity · AI Model

Gemini 2.0 Flash

Gemini 2.0 Flash is Google's December 2024 release, the model that made Gemini agents feel first-class. It introduced native tool use with Google Search grounding, a Multimodal Live API for real-time audio/video, and image generation inside the same model — all on the 1M-token Flash architecture.

Model specs

Vendor
Google
Family
Gemini 2.0
Released
2024-12
Context window
1,000,000 tokens
Modalities
text, vision, audio, video, code
Input price
$0.1/M tok
Output price
$0.4/M tok
Pricing as of
2026-04-20

Strengths

  • Defined Gemini's native tool-use story — Search, code execution, image gen
  • Very low pricing — one of the cheapest 1M-context models at launch
  • Multimodal Live API enables low-latency voice and video agents
  • Broad ecosystem support via AI Studio, Vertex, and the Gemini API

Limitations

  • Surpassed by Gemini 2.5 Flash on reasoning and coding
  • Image generation inside the model was preview-quality at launch
  • No extended thinking mode — use 2.0 Flash Thinking or 2.5 Flash for reasoning

Use cases

  • Real-time multimodal assistants via Multimodal Live API
  • Agent workflows with Google Search grounding
  • Image generation + editing (preview) inside chat
  • High-volume RAG and summarisation on Vertex AI

Benchmarks

BenchmarkScoreAs of
MMLU-Pro≈77%2024-12
GPQA Diamond≈62%2024-12
Video-MME≈70%2024-12

Frequently asked questions

What is Gemini 2.0 Flash?

Gemini 2.0 Flash is Google's December 2024 Flash-tier model, designed around agents and native tool use. It supports multimodal input and output, native Google Search grounding, and a Multimodal Live API for real-time voice and video interactions.

How does Gemini 2.0 Flash compare to 2.5 Flash?

Gemini 2.5 Flash is a direct upgrade — better reasoning, stronger coding, optional thinking mode, and improved multimodal quality. For new projects, use 2.5 Flash. 2.0 Flash remains available for existing pipelines.

How much does Gemini 2.0 Flash cost?

As of April 2026, Gemini 2.0 Flash is priced at roughly USD 0.10 per million input tokens and USD 0.40 per million output tokens — among the cheapest 1M-context models available.

What is the Multimodal Live API in Gemini 2.0 Flash?

It's a WebSocket-based streaming API that accepts audio and video input in real time and emits audio or text replies — roughly Google's equivalent of OpenAI's Realtime API, used for voice agents and video assistants.

Sources

  1. Google — Gemini 2.0 Flash — accessed 2026-04-20
  2. Google Cloud — Gemini pricing — accessed 2026-04-20