NVIDIA Releases Nemotron 3 Ultra: Open 550B MoE Model Built for Long-Running Agents

NVIDIA launched Nemotron 3 Ultra, a fully open 550B-parameter Mixture-of-Experts model optimized for long-running agentic workflows. It delivers 5x faster inference and 30% lower task cost, with weights, data, and training recipes released under the Linux Foundation's OpenMDW-1.1 license.

TL;DR — NVIDIA released Nemotron 3 Ultra, a 550B-parameter (55B active) hybrid Mamba-Transformer MoE model. It is 5x faster than comparable models and reduces agentic task costs by 30%. Weights, training data, and recipes are fully open under the Linux Foundation's OpenMDW-1.1 license.

Built for the Agentic Era

NVIDIA's framing for Nemotron 3 Ultra is unambiguous: "AI is no longer just a thing you ask a question to. Now it's an agent that works on your behalf." Coding agents plan, write, test, debug, and iterate across large codebases. Research agents search, evaluate, cross-reference, and synthesize across hundreds of sources. These workflows run for hours — and Nemotron 3 Ultra was designed from scratch to handle them efficiently.

550B Total parameters (55B active)

5× Faster inference vs. comparable models

30% Lower cost on agentic workloads

10+ Domain-specific teacher models in distillation

Technical Architecture

Hybrid Mamba-Transformer Layers

The core architectural innovation is a hybrid of Transformer and Mamba (SSM) layers. State space models handle long-context sequences with significantly lower memory and compute than pure attention, making Nemotron 3 Ultra practical for agentic sessions that span millions of tokens.

LatentMoE for Expert Routing

LatentMoE enables four times as many experts to be available at the same inference cost as conventional MoE routing. The result: more domain-specialized capacity without sacrificing speed.

NVFP4 Cross-Architecture Quantization

A single NVFP4 checkpoint runs on NVIDIA Hopper, Blackwell, and Ampere GPUs. On Blackwell, it delivers up to 5x higher throughput versus BF16 at equivalent interactivity levels.

Multi-Teacher On-Policy Distillation

Nemotron 3 Ultra is trained with dense feedback from over ten domain-specific teacher models. The full data pipeline, training recipes, and weights are released openly, enabling fine-tuning for any specialized domain.

💡

Try It Now
Nemotron 3 Ultra is available today on Perplexity Pro, OpenRouter, Anaconda, and build.nvidia.com. It is packaged as an NVIDIA NIM microservice for cloud, on-premises, or edge deployment.

Benchmark Performance

On SWE-bench and Terminal Bench 2.0, Nemotron 3 Ultra completed agentic benchmarks using fewer total tokens and fewer tokens per turn than comparable models.

Attribute	Nemotron 3 Ultra	Comparable Open Models
Inference speed	5× faster	Baseline
Agentic task cost	30% lower	Baseline
GPU compatibility	Hopper, Blackwell, Ampere	Varies
License	Fully open (OpenMDW-1.1)	Mostly restricted
Weights open	✅	Mostly ❌

ℹ️

What is OpenMDW-1.1?
The Linux Foundation's permissive license purpose-built for AI model distributions. It covers architecture, parameters, documentation, software, and related artifacts under one framework, and permits commercial use and redistribution after fine-tuning.

Agent Framework Integration

Nemotron 3 Ultra integrates natively with NVIDIA's NemoClaw secure runtime and Hermes agent harness. Swapping in the model in OpenCode or Hermes requires a single-line JSON config change. NVIDIA also released cookbooks and Hugging Face model cards to help teams start in minutes.

ℹ️

The Nemotron Coalition
Rather than building the model in isolation, NVIDIA formed the Nemotron Coalition — a group of partner companies that jointly contribute data and evaluations before each release. Nemotron 4 is already in development.

Key Takeaways

Nemotron 3 Ultra is a 550B/55B hybrid Mamba-Transformer MoE model optimized for long-running agentic tasks.
5x faster inference and 30% lower cost versus comparable models on agentic benchmarks.
NVFP4 quantization runs one checkpoint across Hopper, Blackwell, and Ampere GPUs.
Fully open: weights, data, and training recipes released under OpenMDW-1.1.
Available now on Perplexity Pro, OpenRouter, build.nvidia.com, and as an NVIDIA NIM microservice.

🔗

Official Sources & Resources
— NVIDIA NeMo GitHub — official training and fine-tuning framework
— Megatron-LM GitHub — large-scale distributed training library underlying Nemotron
— NeMo Guardrails GitHub — safety controls and guardrail library for agents