Curiosity · AI Model

Gemma 3 12B

Gemma 3 12B is the mid-size member of the Gemma 3 family (March 2025), sized to fit on a single H100 or a 24 GB consumer GPU with quantisation. Like its 4B sibling it adds image input, 128k context, and 140+ language coverage over Gemma 2, but with substantially stronger reasoning, coding, and math performance. Google positions it as a sweet-spot open model for developers who need more capability than Llama 3.1 8B but without the hardware demands of 27B+.

Model specs

Vendor: Google DeepMind
Family: Gemma 3
Released: 2025-03
Context window: 128,000 tokens
Modalities: text, vision

Strengths

Better reasoning and coding than Gemma 3 4B
Fits on a single H100 at bf16 or 24 GB at int4
Multimodal + multilingual in an open model
Gemini-2.0-distilled recipes

Limitations

Still below 27B / 70B for the hardest reasoning tasks
Licence has Gemma-style usage conditions
Vision capped at images (no video)
Compute-heavy for 4B-tier hardware

Use cases

Open-weights chat and coding assistant on a single GPU
Enterprise RAG over long documents at 128k
Multilingual support in regulated on-prem settings
Vision-augmented agents with modest hardware

Benchmarks

Benchmark	Score	As of
MMLU (5-shot)	≈72%	2025-03
HumanEval	≈75%	2025-03
MMMU	≈58%	2025-03

Frequently asked questions

What is Gemma 3 12B?

The mid-size member of Google DeepMind's Gemma 3 family — a 12B open multimodal LLM with 128k context and 140+ language coverage, released March 2025.

Where does Gemma 3 12B sit versus Llama 3.1?

It is competitive with Llama 3.1 8B and approaches Llama 3.1 70B on many benchmarks while adding multimodal input and a 128k context.

What hardware does it need?

Bf16 inference fits on a single H100 (80 GB). 24 GB consumer GPUs can run int4 quantisations.

Sources

Google DeepMind — Gemma 3 — accessed 2026-04-20
Gemma 3 technical report — accessed 2026-04-20