Curiosity · AI Model

Gemma 2 2B

Gemma 2 2B is the smallest member of Google DeepMind's Gemma 2 open-weight family, released in 2024. At 2.6B parameters it is distilled from larger Gemma 9B/27B teachers, delivering surprisingly strong benchmark scores for its size. It is designed for edge, browser, and offline inference — often running via MediaPipe or WebGPU.

Model specs

Vendor: Google
Family: Gemma 2
Released: 2024-07
Context window: 8,192 tokens
Modalities: text

Strengths

Distilled from 27B teacher — strong quality for 2.6B
Runs in a browser (WebLLM) or on a phone
Permissive Gemma Terms of Use allow commercial use

Limitations

Small 8k context
Weak math reasoning (low GSM8K)
Gemma Terms are not a standard OSS licence

Use cases

On-device chatbots on phones and laptops
Browser-based AI demos via WebGPU / WebLLM
Fine-tuning experiments on small GPUs
Edge agents in bandwidth-constrained environments

Benchmarks

Benchmark	Score	As of
MMLU	~52%	2026-04
GSM8K	~24%	2026-04
MT-Bench	~7.5	2026-04

Frequently asked questions

What is Gemma 2 2B?

Gemma 2 2B is Google DeepMind's 2.6-billion-parameter open-weight language model, distilled from the larger Gemma 2 27B model. It is sized for phones, browsers, and single-GPU inference.

Can Gemma 2 2B run in the browser?

Yes — via WebGPU engines like WebLLM or MediaPipe GenAI. Quantised to 4-bit, it fits in a few hundred megabytes and runs at interactive speeds on modern laptops.

Sources

Gemma 2 2B on HuggingFace — accessed 2026-04-20
Google — Gemma 2 announcement — accessed 2026-04-20