Overview
- Google DeepMind published DiffusionGemma as open weights under an Apache 2.0 license on June 10, 2026, making the model immediately available for download and experimentation.
- DiffusionGemma replaces token-by-token decoding with diffusion-style parallel generation that denoises up to 256 tokens per step, letting the model refine entire text blocks at once.
- NVIDIA published optimized builds and benchmarks showing roughly 3–4x faster single-user throughput on its hardware with reported figures such as about 1,000 tokens/sec on a single H100 and higher local speeds on DGX Station and RTX GPUs.
- The release includes day-zero support across Hugging Face, vLLM and Unsloth and tools for local fine-tuning, but DeepMind and NVIDIA caution that DiffusionGemma trades some output quality for speed compared with standard Gemma 4.
- Practical adoption will depend on broader real-world testing because published throughput was measured in controlled setups, so teams should validate coherence, safety, and integration into production pipelines before deploying at scale.