
Md Sahil (Sahil) Akhtar
Analyzing Corruption Processes and Stability in Discrete Diffusion for Language Modeling
2025–2026
Physics; Electrical Engineering and Computer Science
- Physics
- Generative AI
Farias, Vivek F.
Discrete diffusion models have recently gained traction as a potential alternative to autoregressive approaches for generating discrete data such as text. Their main appeal lies in the ability to generate multiple arbitrarily positioned tokens in parallel and to iteratively refine already generated tokens. However, several foundational aspects remain poorly understood. This project focuses on two core challenges: (1) designing and optimizing corruption kernels that govern the forward noising process in discrete diffusion models, and (2) addressing stability and normalization issues identified in Score Energy Discrete Diffusion (SEDD). Our aim is to develop a more rigorous theoretical and empirical understanding of these models to improve sample quality and stability in language tasks, paving the way for discrete diffusion to become a viable alternative to today’s state-of-the-art autoregressive language models.
You are that.