Particle.news
Download on the App Store

Nvidia Unveils Nemotron 3 Ultra, a 550B Open-Weight AI Model

Publishing weights and training recipes, Nvidia aims to cut inference costs and speed research by using a mixture-of-experts design with NVFP4-driven throughput gains.

Overview

  • Nvidia unveiled Nemotron 3 Ultra at Computex in Taipei, which was shown on Monday and is a roughly 550-billion-parameter open-weight model designed for advanced reasoning, planning, and agentic workflows.
  • The Ultra uses mixture-of-experts routing so only a fraction of parameters run per request, combined with NVFP4 training, Mamba-2 layers, and multi-token prediction to support a 1‑million‑token context window and lower active compute.
  • Nvidia says the new techniques can deliver up to 5x higher throughput versus prior versions, and on a pre-release DeepInfra endpoint the model produced more than 300 output tokens per second; the company said Nemotron 3 Ultra will ship on June 4.
  • Independent evaluation by Artificial Analysis placed Nemotron 3 Ultra at 48 on its Intelligence Index while Moonshot AI’s Kimi K2.6 scored 54, and Decrypt reported the Ultra is faster in throughput than several leading Chinese models but currently trails the top Chinese open models on raw intelligence.
  • The launch ties to a broader push: Nvidia disclosed a $26 billion five-year plan for open-weight AI, reported roughly 50 million prior downloads of the Nemotron 3 family, formed an eight-lab Nemotron Coalition, and said work on Nemotron 4 is already underway.