Particle News: Nvidia Claims 10x MoE Inference Gains With New 72‑Chip AI Server

Overview

Nvidia released company benchmarks on Dec. 3 showing its latest server with 72 top-tier chips delivers roughly a tenfold boost on mixture‑of‑experts inference workloads.
Tests included Moonshot AI’s Kimi K2 Thinking model, with Nvidia reporting comparable gains on DeepSeek models.
The company credits the performance to dense chip counts and high‑speed interconnects linking the accelerators.
The data addresses inference rather than training, reflecting the industry’s shift toward serving models to users at scale.
AMD is developing a similar multi‑chip server slated for next year, and Cerebras features among the rivals contesting the inference market.