Curiosity · AI Model
Nemotron Mini 4B Instruct
Nemotron Mini 4B, released in 2024 via NVIDIA's ACE game-character and on-device stack, is a 4-billion-parameter instruct model derived from the larger Mistral NeMo 12B via NVIDIA's Minitron pruning and distillation pipeline. It is tuned for fast chat and function calling on consumer RTX GPUs and inside NVIDIA's Riva/ACE agents.
Model specs
- Vendor
- NVIDIA
- Family
- Nemotron Mini
- Released
- 2024-09
- Context window
- 4,096 tokens
- Modalities
- text
Strengths
- 4B footprint runs on consumer GPUs and Jetson devices
- Fast inference with TensorRT-LLM and NIM microservices
- Good function-calling for its size
Limitations
- Small capacity limits reasoning and long-context performance
- Optimised for NVIDIA stack — portability is secondary
- Community license derived from Mistral NeMo base has some restrictions
Use cases
- On-device chat on RTX workstations and laptops
- NPC and game-character dialogue via NVIDIA ACE
- Low-latency function-calling agents
- Classroom demos of compact instruction-tuned LLMs
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| MMLU | ≈56% | 2024-09 |
| HumanEval | ≈45% | 2024-09 |
Frequently asked questions
What is Nemotron Mini 4B Instruct?
Nemotron Mini 4B Instruct is NVIDIA's 4-billion-parameter on-device chat model, derived from Mistral NeMo 12B via Minitron pruning and distillation and tuned for function calling and character dialogue.
What is Minitron?
Minitron is NVIDIA's pruning-and-distillation recipe that turns larger LLMs into smaller ones while preserving most of the quality. Nemotron Mini 4B is one of its headline results.
Where can I use Nemotron Mini?
Weights are on Hugging Face under 'nvidia/Nemotron-Mini-4B-Instruct' and NVIDIA hosts it in NIM for RTX-class deployment and inside the ACE agent stack.
Sources
- NVIDIA — Nemotron Mini 4B — accessed 2026-04-20
- Hugging Face — nvidia/Nemotron-Mini-4B-Instruct — accessed 2026-04-20