Curiosity · AI Model

Nemotron Mini 4B Instruct

Nemotron Mini 4B, released in 2024 via NVIDIA's ACE game-character and on-device stack, is a 4-billion-parameter instruct model derived from the larger Mistral NeMo 12B via NVIDIA's Minitron pruning and distillation pipeline. It is tuned for fast chat and function calling on consumer RTX GPUs and inside NVIDIA's Riva/ACE agents.

Model specs

Vendor: NVIDIA
Family: Nemotron Mini
Released: 2024-09
Context window: 4,096 tokens
Modalities: text

Strengths

4B footprint runs on consumer GPUs and Jetson devices
Fast inference with TensorRT-LLM and NIM microservices
Good function-calling for its size

Limitations

Small capacity limits reasoning and long-context performance
Optimised for NVIDIA stack — portability is secondary
Community license derived from Mistral NeMo base has some restrictions

Use cases

On-device chat on RTX workstations and laptops
NPC and game-character dialogue via NVIDIA ACE
Low-latency function-calling agents
Classroom demos of compact instruction-tuned LLMs

Benchmarks

Benchmark	Score	As of
MMLU	≈56%	2024-09
HumanEval	≈45%	2024-09

Frequently asked questions

What is Nemotron Mini 4B Instruct?

Nemotron Mini 4B Instruct is NVIDIA's 4-billion-parameter on-device chat model, derived from Mistral NeMo 12B via Minitron pruning and distillation and tuned for function calling and character dialogue.

What is Minitron?

Minitron is NVIDIA's pruning-and-distillation recipe that turns larger LLMs into smaller ones while preserving most of the quality. Nemotron Mini 4B is one of its headline results.

Where can I use Nemotron Mini?

Weights are on Hugging Face under 'nvidia/Nemotron-Mini-4B-Instruct' and NVIDIA hosts it in NIM for RTX-class deployment and inside the ACE agent stack.

Sources

NVIDIA — Nemotron Mini 4B — accessed 2026-04-20
Hugging Face — nvidia/Nemotron-Mini-4B-Instruct — accessed 2026-04-20