Particle News: DeepMind Releases DiffusionGemma for Faster Local Text Generation

Overview

Google DeepMind published DiffusionGemma as open weights under an Apache 2.0 license on June 10, 2026, making the model immediately available for download and experimentation.
DiffusionGemma replaces token-by-token decoding with diffusion-style parallel generation that denoises up to 256 tokens per step, letting the model refine entire text blocks at once.
NVIDIA published optimized builds and benchmarks showing roughly 3–4x faster single-user throughput on its hardware with reported figures such as about 1,000 tokens/sec on a single H100 and higher local speeds on DGX Station and RTX GPUs.
The release includes day-zero support across Hugging Face, vLLM and Unsloth and tools for local fine-tuning, but DeepMind and NVIDIA caution that DiffusionGemma trades some output quality for speed compared with standard Gemma 4.
Practical adoption will depend on broader real-world testing because published throughput was measured in controlled setups, so teams should validate coherence, safety, and integration into production pipelines before deploying at scale.