Curiosity · AI Model
Gemma 2 2B
Gemma 2 2B is the smallest member of Google DeepMind's Gemma 2 open-weight family, released in 2024. At 2.6B parameters it is distilled from larger Gemma 9B/27B teachers, delivering surprisingly strong benchmark scores for its size. It is designed for edge, browser, and offline inference — often running via MediaPipe or WebGPU.
Model specs
- Vendor
- Family
- Gemma 2
- Released
- 2024-07
- Context window
- 8,192 tokens
- Modalities
- text
Strengths
- Distilled from 27B teacher — strong quality for 2.6B
- Runs in a browser (WebLLM) or on a phone
- Permissive Gemma Terms of Use allow commercial use
Limitations
- Small 8k context
- Weak math reasoning (low GSM8K)
- Gemma Terms are not a standard OSS licence
Use cases
- On-device chatbots on phones and laptops
- Browser-based AI demos via WebGPU / WebLLM
- Fine-tuning experiments on small GPUs
- Edge agents in bandwidth-constrained environments
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| MMLU | ~52% | 2026-04 |
| GSM8K | ~24% | 2026-04 |
| MT-Bench | ~7.5 | 2026-04 |
Frequently asked questions
What is Gemma 2 2B?
Gemma 2 2B is Google DeepMind's 2.6-billion-parameter open-weight language model, distilled from the larger Gemma 2 27B model. It is sized for phones, browsers, and single-GPU inference.
Can Gemma 2 2B run in the browser?
Yes — via WebGPU engines like WebLLM or MediaPipe GenAI. Quantised to 4-bit, it fits in a few hundred megabytes and runs at interactive speeds on modern laptops.
Sources
- Gemma 2 2B on HuggingFace — accessed 2026-04-20
- Google — Gemma 2 announcement — accessed 2026-04-20