Curiosity · AI Model

Nemotron Ultra 253B

Nemotron Ultra 253B, released by NVIDIA in 2025, is the heaviest open-weights model in the Llama Nemotron series. Built from Llama 3.1 405B via efficient fine-tuning and architecture slicing, it targets enterprise-grade reasoning on code, math, and RAG workloads, and is optimised for high throughput on NVIDIA GPUs with TensorRT-LLM.

Model specs

Vendor
NVIDIA
Family
Llama Nemotron
Released
2025-03
Context window
128,000 tokens
Modalities
text

Strengths

  • Top-tier open-weights reasoning at launch
  • Highly optimised for NVIDIA hardware and TensorRT-LLM
  • Includes curated datasets and open training recipes

Limitations

  • Enormous footprint — multi-node serving required
  • Focused on NVIDIA infrastructure; portability outside the stack is uneven
  • Llama community license imposes some commercial restrictions

Use cases

  • Enterprise reasoning workloads served via NVIDIA NIM
  • Research on frontier-scale open-weights post-training
  • High-throughput inference on H100 and B200 fleets
  • Customisation platform for domain-specific agents

Benchmarks

BenchmarkScoreAs of
MMLU-Pro≈78%2025-03
MATH≈82%2025-03
HumanEval≈89%2025-03

Frequently asked questions

What is Nemotron Ultra 253B?

Nemotron Ultra 253B is NVIDIA's top-tier open-weights reasoning model, derived from Llama 3.1 405B via extensive post-training and architectural adjustments for efficient inference on NVIDIA GPUs.

How is Nemotron Ultra different from Llama 3.1 405B?

NVIDIA applies pruning, architecture slicing, and reasoning-focused fine-tuning, producing a 253B model that matches or beats the base 405B on many evals while being substantially cheaper to serve.

Where can I run Nemotron Ultra?

Weights are available on Hugging Face under the 'nvidia' organisation, and NVIDIA offers hosted access through NIM microservices in their AI Foundry.

Sources

  1. NVIDIA — Llama Nemotron Ultra — accessed 2026-04-20
  2. Hugging Face — nvidia/Llama-3_1-Nemotron-Ultra-253B — accessed 2026-04-20