Overview
- NVIDIA published first-round AgentPerf results on June 12, 2026 showing its GB300 NVL72 rack can run up to 20 times more concurrent agentic AI sessions per megawatt than the prior HGX H200 system.
- The benchmark, AgentPerf from Artificial Analysis, models chained agent workflows using the DeepSeek V4 Pro mixture-of-experts model and simulates external tool calls so the test isolates accelerated compute and power efficiency.
- NVIDIA credits the gain to full-stack codesign including a 72‑GPU rack topology, faster NVLink interconnects, FP4 and a second‑generation Transformer Engine, CUDA optimizations that overlap compute and communication, and TensorRT LLM software.
- The efficiency leap could change data‑center economics by letting operators scale agentic services within existing power limits and has already prompted some partners to deploy Blackwell for production agent workloads.
- Results still need broader independent replication because the benchmark simulates tool calls and earlier third‑party analyses reported different magnitudes of improvement, with some finding up to 50x throughput gains per megawatt.