Technology ❯ Artificial Intelligence ❯ Machine Learning ❯ Natural Language Processing
Two new preprints show practical methods to let diffusion and hybrid-attention models use autoregressive token verification to cut generation latency.