Science ❯ Computer Science ❯ Artificial Intelligence ❯ Machine Learning
Two new preprints show practical methods to let diffusion and hybrid-attention models use autoregressive token verification to cut generation latency.