DFlash: Block Diffusion for Flash Speculative Decoding
Jian Chen, Yesheng Liang et al.
TLDR: DFlash is a speculative decoding framework that uses block diffusion for parallel drafting, achieving over 6x acceleration in language model inference compared to traditional methods.