Overview
- Ant Technology Research Institute introduced LLaDA2.0, a discrete‑diffusion LLM family with MoE 16B “mini” and 100B “flash” variants.
- The 100B model is described by Ant as the industry’s first diffusion language model at that parameter scale.
- Model weights and training code are open-sourced on Hugging Face to facilitate independent evaluation.
- Training adopts a Warmup‑Stable‑Decay schedule to reuse autoregressive knowledge, complemented by confidence‑aware parallel training and a diffusion‑form DPO.
- Ant reports roughly 2.1× faster inference from parallel decoding with strong results on structured generation such as code, pending external verification.