Curiosity · AI Model
Gemma 3 12B
Gemma 3 12B is the mid-size member of the Gemma 3 family (March 2025), sized to fit on a single H100 or a 24 GB consumer GPU with quantisation. Like its 4B sibling it adds image input, 128k context, and 140+ language coverage over Gemma 2, but with substantially stronger reasoning, coding, and math performance. Google positions it as a sweet-spot open model for developers who need more capability than Llama 3.1 8B but without the hardware demands of 27B+.
Model specs
- Vendor
- Google DeepMind
- Family
- Gemma 3
- Released
- 2025-03
- Context window
- 128,000 tokens
- Modalities
- text, vision
Strengths
- Better reasoning and coding than Gemma 3 4B
- Fits on a single H100 at bf16 or 24 GB at int4
- Multimodal + multilingual in an open model
- Gemini-2.0-distilled recipes
Limitations
- Still below 27B / 70B for the hardest reasoning tasks
- Licence has Gemma-style usage conditions
- Vision capped at images (no video)
- Compute-heavy for 4B-tier hardware
Use cases
- Open-weights chat and coding assistant on a single GPU
- Enterprise RAG over long documents at 128k
- Multilingual support in regulated on-prem settings
- Vision-augmented agents with modest hardware
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| MMLU (5-shot) | ≈72% | 2025-03 |
| HumanEval | ≈75% | 2025-03 |
| MMMU | ≈58% | 2025-03 |
Frequently asked questions
What is Gemma 3 12B?
The mid-size member of Google DeepMind's Gemma 3 family — a 12B open multimodal LLM with 128k context and 140+ language coverage, released March 2025.
Where does Gemma 3 12B sit versus Llama 3.1?
It is competitive with Llama 3.1 8B and approaches Llama 3.1 70B on many benchmarks while adding multimodal input and a 128k context.
What hardware does it need?
Bf16 inference fits on a single H100 (80 GB). 24 GB consumer GPUs can run int4 quantisations.
Sources
- Google DeepMind — Gemma 3 — accessed 2026-04-20
- Gemma 3 technical report — accessed 2026-04-20