Curiosity · AI Model

Gemini 2.0 Flash

Gemini 2.0 Flash is Google's December 2024 release, the model that made Gemini agents feel first-class. It introduced native tool use with Google Search grounding, a Multimodal Live API for real-time audio/video, and image generation inside the same model — all on the 1M-token Flash architecture.

Model specs

Vendor: Google
Family: Gemini 2.0
Released: 2024-12
Context window: 1,000,000 tokens
Modalities: text, vision, audio, video, code
Input price: $0.1/M tok
Output price: $0.4/M tok
Pricing as of: 2026-04-20

Strengths

Defined Gemini's native tool-use story — Search, code execution, image gen
Very low pricing — one of the cheapest 1M-context models at launch
Multimodal Live API enables low-latency voice and video agents
Broad ecosystem support via AI Studio, Vertex, and the Gemini API

Limitations

Surpassed by Gemini 2.5 Flash on reasoning and coding
Image generation inside the model was preview-quality at launch
No extended thinking mode — use 2.0 Flash Thinking or 2.5 Flash for reasoning

Use cases

Real-time multimodal assistants via Multimodal Live API
Agent workflows with Google Search grounding
Image generation + editing (preview) inside chat
High-volume RAG and summarisation on Vertex AI

Benchmarks

Benchmark	Score	As of
MMLU-Pro	≈77%	2024-12
GPQA Diamond	≈62%	2024-12
Video-MME	≈70%	2024-12

Frequently asked questions

What is Gemini 2.0 Flash?

Gemini 2.0 Flash is Google's December 2024 Flash-tier model, designed around agents and native tool use. It supports multimodal input and output, native Google Search grounding, and a Multimodal Live API for real-time voice and video interactions.

How does Gemini 2.0 Flash compare to 2.5 Flash?

Gemini 2.5 Flash is a direct upgrade — better reasoning, stronger coding, optional thinking mode, and improved multimodal quality. For new projects, use 2.5 Flash. 2.0 Flash remains available for existing pipelines.

How much does Gemini 2.0 Flash cost?

As of April 2026, Gemini 2.0 Flash is priced at roughly USD 0.10 per million input tokens and USD 0.40 per million output tokens — among the cheapest 1M-context models available.

What is the Multimodal Live API in Gemini 2.0 Flash?

It's a WebSocket-based streaming API that accepts audio and video input in real time and emits audio or text replies — roughly Google's equivalent of OpenAI's Realtime API, used for voice agents and video assistants.

Sources

Google — Gemini 2.0 Flash — accessed 2026-04-20
Google Cloud — Gemini pricing — accessed 2026-04-20