Curiosity · AI Model

Nemotron Mini 4B Instruct

Nemotron Mini 4B, released in 2024 via NVIDIA's ACE game-character and on-device stack, is a 4-billion-parameter instruct model derived from the larger Mistral NeMo 12B via NVIDIA's Minitron pruning and distillation pipeline. It is tuned for fast chat and function calling on consumer RTX GPUs and inside NVIDIA's Riva/ACE agents.

Model specs

Vendor
NVIDIA
Family
Nemotron Mini
Released
2024-09
Context window
4,096 tokens
Modalities
text

Strengths

  • 4B footprint runs on consumer GPUs and Jetson devices
  • Fast inference with TensorRT-LLM and NIM microservices
  • Good function-calling for its size

Limitations

  • Small capacity limits reasoning and long-context performance
  • Optimised for NVIDIA stack — portability is secondary
  • Community license derived from Mistral NeMo base has some restrictions

Use cases

  • On-device chat on RTX workstations and laptops
  • NPC and game-character dialogue via NVIDIA ACE
  • Low-latency function-calling agents
  • Classroom demos of compact instruction-tuned LLMs

Benchmarks

BenchmarkScoreAs of
MMLU≈56%2024-09
HumanEval≈45%2024-09

Frequently asked questions

What is Nemotron Mini 4B Instruct?

Nemotron Mini 4B Instruct is NVIDIA's 4-billion-parameter on-device chat model, derived from Mistral NeMo 12B via Minitron pruning and distillation and tuned for function calling and character dialogue.

What is Minitron?

Minitron is NVIDIA's pruning-and-distillation recipe that turns larger LLMs into smaller ones while preserving most of the quality. Nemotron Mini 4B is one of its headline results.

Where can I use Nemotron Mini?

Weights are on Hugging Face under 'nvidia/Nemotron-Mini-4B-Instruct' and NVIDIA hosts it in NIM for RTX-class deployment and inside the ACE agent stack.

Sources

  1. NVIDIA — Nemotron Mini 4B — accessed 2026-04-20
  2. Hugging Face — nvidia/Nemotron-Mini-4B-Instruct — accessed 2026-04-20