Particle.news
Download on the App Store

Nvidia Claims 10x MoE Inference Gains With New 72‑Chip AI Server

The pivot to large‑scale inference puts Nvidia's hardware strategy under fresh competitive pressure.

A smartphone with a displayed NVIDIA logo is placed on a computer motherboard in this illustration taken March 6, 2023. REUTERS/Dado Ruvic/Illustration

Overview

  • Nvidia released company benchmarks on Dec. 3 showing its latest server with 72 top-tier chips delivers roughly a tenfold boost on mixture‑of‑experts inference workloads.
  • Tests included Moonshot AI’s Kimi K2 Thinking model, with Nvidia reporting comparable gains on DeepSeek models.
  • The company credits the performance to dense chip counts and high‑speed interconnects linking the accelerators.
  • The data addresses inference rather than training, reflecting the industry’s shift toward serving models to users at scale.
  • AMD is developing a similar multi‑chip server slated for next year, and Cerebras features among the rivals contesting the inference market.