NVIDIA released Nemotron 3 Ultra on June 1, making it the most capable open-weights model to ship from a US lab in over a year. The model scores 48 on the Artificial Analysis Intelligence Index (AAII), compared to 39 for Gemma 4 31B, the next strongest American open-weights model. That gap matters.
Nemotron 3 Ultra uses a mixture-of-experts architecture with 550B total parameters and 55B active at inference time. The MoE design keeps per-token compute manageable while retaining the representational capacity of a much larger dense model. NVIDIA will release the model in NVFP4, its own 4-bit floating-point quantization format, which is engineered specifically for Hopper and Blackwell hardware.
The NVFP4 format is worth examining directly. Quantization in a vendor-specific format means the easiest deployment path runs on NVIDIA silicon. Teams on AMD or custom silicon will face extra friction. This is not unusual for NVIDIA, but it is a format adoption play worth noting before committing to a deployment stack.
On throughput, Artificial Analysis reported Nemotron 3 Ultra serving at over 300 tokens per second on a pre-release Deep Infra endpoint. At that speed, the model is practical for production agent workloads where latency compounds across tool calls. Most open-weights models at this parameter count serve far slower; the MoE architecture, combined with NVFP4’s reduced memory footprint, is what enables the throughput figure.
The “most intelligent open-weights model from the US” framing in NVIDIA’s announcement is accurate but carries a built-in qualifier. Chinese open-weights labs have held the top of the open-weights leaderboard for most of 2025 and into 2026. Qwen, DeepSeek, and MiniMax all operate above the AAII 48 mark. Nemotron 3 Ultra is the strongest American entry, not the strongest entry overall.
The US open-weights picture has been thin for several months. Meta’s Llama roadmap has not produced a frontier-competitive release recently. Mistral sits in the mid-tier. The Nemotron release is the most credible American open-weights push since that gap opened, and it comes from a chipmaker rather than a pure-play model lab, which says something about where NVIDIA sees the strategic value of influencing the open ecosystem.
Nemotron 3 Ultra benchmarks well on coding, reasoning, and multi-turn tasks according to Artificial Analysis. Independent third-party evaluations beyond AAII have not yet been published as of this writing. The AAII score of 48 places the model well below closed frontier offerings: GPT-5, Claude Opus 4.8, and Gemini 3 Pro all sit above it. For teams that require open weights for compliance, cost, or customization reasons, Nemotron 3 Ultra is the current American ceiling. For teams without that constraint, the closed frontier still leads.
For operators evaluating open-weights options for agent pipelines, the 300-plus tokens-per-second throughput figure on Deep Infra is the most operationally relevant number in this release. Benchmark the NVFP4 build against your actual workload before the full public release lands; throughput at pre-release may shift at general availability.
Artificial Analysis published benchmark results and inference performance data for NVIDIA Nemotron 3 Ultra on June 1, 2026, via an X thread mirrored at threadreaderapp.com.