Overview
- Inception Labs says Mercury 2 generates about 1,000 tokens per second and reported a 90% score on the AIME 2026 reasoning test in coverage published on Sunday, June 21, 2026.
- Google DeepMind released DiffusionGemma as an experimental, open-weight model that scored 69.1% on the same AIME test and is explicitly labeled lower quality than its standard Gemma 4 in developer guidance.
- Mercury 2 is offered as a closed, paid API with published per-token pricing and NVIDIA Blackwell throughput claims while DiffusionGemma is available on Hugging Face for developers to run and inspect.
- Practical case studies and coverage highlight immediate use cases—real-time coding, fast subagent calls, and voice interfaces—where parallel diffusion generation can cut latency and inference cost, including a reported 82% latency drop and 90% cost cut in an Augment Code test.
- Major open questions remain about independent benchmarking across tasks and hardware, how diffusion affects high-level reasoning at scale, and whether runtimes and agent tooling will adapt to the different GPU and compute patterns required for parallel generation.