NVIDIA's Blackwell Ultra handles 20x more agents per megawatt than Hopper

The first agentic AI benchmark puts rack-level power efficiency at the center of the infrastructure conversation, not peak throughput.

Alessandro Benigni

PUBLISHED JUN 16, 2026

1 MIN READ

Follow on Google

-1012 MIN AGO

NVIDIA's Blackwell Ultra handles 20x more agents per megawatt than Hopper — featured image for AI Insiders

NVIDIA published results late last week showing its GB300 NVL72 platform runs 20 times more concurrent agents per megawatt than the older HGX H200 system, according to the NVIDIA company blog. The metric comes from AgentPerf, a benchmark developed by Artificial Analysis that measures how many simultaneous agentic tasks a platform can sustain while meeting defined response-rate thresholds.

The benchmark uses DeepSeek V4 Pro, a large mixture-of-experts model, as its workload proxy. Rather than measuring single-prompt latency, AgentPerf simulates real coding-agent trajectories: file reads, writes, command execution, and context growth across dozens of chained calls.

One caveat worth noting: the benchmark is new, the results were published on NVIDIA’s own blog, and Artificial Analysis has commercial relationships with infrastructure providers. No independent replication has been reported. The 20x figure is a comparison against NVIDIA’s own prior generation, not against AMD, Intel, or cloud-provider alternatives.

For teams sizing agent infrastructure, throughput per megawatt is the number that determines how many agents you can run before power capacity, not chip count, becomes the ceiling. Teams planning rack purchases for 2026 agent deployments should request AgentPerf figures from all vendors before committing, and verify the workload profile matches their own agent call patterns.

Source: NVIDIA company blog, published June 13, 2026.

NVIDIA's Blackwell Ultra handles 20x more agents per megawatt than Hopper

The morning brief for people inside the AI industry.

More in Wire

Ramp built a private SWE-Bench from its own production bugs

Kernel fusion is where PyTorch inference speed actually hides

NVIDIA ships open-source scanner for agent skill supply-chain risk