ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
ReFusion, developed by GSAI-ML, introduces a novel approach to text generation by combining a masked diffusion model with parallel autoregressive decoding. This architecture is designed to enhance both the performance and efficiency of language generation tasks.
Key Capabilities & Innovations
- Masked Diffusion Model (MDM): Utilizes a diffusion process for text generation, allowing for more flexible and potentially higher-quality outputs compared to purely autoregressive models.
- Full KV Cache Reuse: Optimizes computational efficiency by fully reusing the Key-Value cache, which is crucial for faster inference.
- Any-Order Generation Support: Unlike standard autoregressive models that generate tokens sequentially, ReFusion supports generating tokens in any order, offering greater flexibility in the decoding process.
- Parallel Autoregressive Decoding: Integrates parallel decoding within its diffusion framework, aiming to accelerate generation while maintaining coherence and quality.
- Gumbel Noise Integration: Employs Gumbel noise during generation, with a configurable temperature parameter, to influence perplexity and generation quality.
Good for
- Developers seeking advanced text generation models that prioritize both efficiency and flexible decoding strategies.
- Research into novel generation architectures that move beyond traditional left-to-right autoregression.
- Applications where the ability to generate content in a non-sequential manner could offer advantages in quality or speed.
For more technical details, refer to the arXiv paper.