Curiosity · AI Model

Gemma 2 2B

Gemma 2 2B is the smallest member of Google DeepMind's Gemma 2 open-weight family, released in 2024. At 2.6B parameters it is distilled from larger Gemma 9B/27B teachers, delivering surprisingly strong benchmark scores for its size. It is designed for edge, browser, and offline inference — often running via MediaPipe or WebGPU.

Model specs

Vendor
Google
Family
Gemma 2
Released
2024-07
Context window
8,192 tokens
Modalities
text

Strengths

  • Distilled from 27B teacher — strong quality for 2.6B
  • Runs in a browser (WebLLM) or on a phone
  • Permissive Gemma Terms of Use allow commercial use

Limitations

  • Small 8k context
  • Weak math reasoning (low GSM8K)
  • Gemma Terms are not a standard OSS licence

Use cases

  • On-device chatbots on phones and laptops
  • Browser-based AI demos via WebGPU / WebLLM
  • Fine-tuning experiments on small GPUs
  • Edge agents in bandwidth-constrained environments

Benchmarks

BenchmarkScoreAs of
MMLU~52%2026-04
GSM8K~24%2026-04
MT-Bench~7.52026-04

Frequently asked questions

What is Gemma 2 2B?

Gemma 2 2B is Google DeepMind's 2.6-billion-parameter open-weight language model, distilled from the larger Gemma 2 27B model. It is sized for phones, browsers, and single-GPU inference.

Can Gemma 2 2B run in the browser?

Yes — via WebGPU engines like WebLLM or MediaPipe GenAI. Quantised to 4-bit, it fits in a few hundred megabytes and runs at interactive speeds on modern laptops.

Sources

  1. Gemma 2 2B on HuggingFace — accessed 2026-04-20
  2. Google — Gemma 2 announcement — accessed 2026-04-20