Curiosity · AI Model

Gemma 3 12B

Gemma 3 12B is the mid-size member of the Gemma 3 family (March 2025), sized to fit on a single H100 or a 24 GB consumer GPU with quantisation. Like its 4B sibling it adds image input, 128k context, and 140+ language coverage over Gemma 2, but with substantially stronger reasoning, coding, and math performance. Google positions it as a sweet-spot open model for developers who need more capability than Llama 3.1 8B but without the hardware demands of 27B+.

Model specs

Vendor
Google DeepMind
Family
Gemma 3
Released
2025-03
Context window
128,000 tokens
Modalities
text, vision

Strengths

  • Better reasoning and coding than Gemma 3 4B
  • Fits on a single H100 at bf16 or 24 GB at int4
  • Multimodal + multilingual in an open model
  • Gemini-2.0-distilled recipes

Limitations

  • Still below 27B / 70B for the hardest reasoning tasks
  • Licence has Gemma-style usage conditions
  • Vision capped at images (no video)
  • Compute-heavy for 4B-tier hardware

Use cases

  • Open-weights chat and coding assistant on a single GPU
  • Enterprise RAG over long documents at 128k
  • Multilingual support in regulated on-prem settings
  • Vision-augmented agents with modest hardware

Benchmarks

BenchmarkScoreAs of
MMLU (5-shot)≈72%2025-03
HumanEval≈75%2025-03
MMMU≈58%2025-03

Frequently asked questions

What is Gemma 3 12B?

The mid-size member of Google DeepMind's Gemma 3 family — a 12B open multimodal LLM with 128k context and 140+ language coverage, released March 2025.

Where does Gemma 3 12B sit versus Llama 3.1?

It is competitive with Llama 3.1 8B and approaches Llama 3.1 70B on many benchmarks while adding multimodal input and a 128k context.

What hardware does it need?

Bf16 inference fits on a single H100 (80 GB). 24 GB consumer GPUs can run int4 quantisations.

Sources

  1. Google DeepMind — Gemma 3 — accessed 2026-04-20
  2. Gemma 3 technical report — accessed 2026-04-20