Discrete Diffusion Reading Group

Exploring diffusion-based generative models on discrete spaces.

About the Reading Group

Diffusion LLMs are faster, more controllable successors to traditional LLMs and are rapidly gaining adoption. This reading group aims to build a community for exchanging and debating emerging ideas in this space. While our primary focus is discrete diffusion models for language, we also invite work that extends these methods to other modalities and applications—such as molecular design, drug discovery, and beyond. Each session features an author-led presentation followed by Q&A, with recordings shared on our YouTube channel.

Paper Discussions

Authors present their work followed by discussions and Q&A sessions

Recorded Sessions

All sessions are recorded and available on YouTube

Community

Stay informed through our email list and Twitter/X

Meet the Organizers

Subham Sekhar Sahoo

Subham Sahoo

Holds a Ph.D. from Cornell Tech, where he specialized in Diffusion Language Models. He has made foundational contributions to the field, with his work deployed at scale by Google, NVIDIA, and ByteDance across language generation and drug discovery.

Justin Deschenaux

Justin Deschenaux

PhD student in Machine Learning at EPFL, advised by Prof. Caglar Gulcehre. Previously interned at Apple MLR. His research interests include diffusion language models, fast generative models, and generalization.

Zhihan Yang

Zhihan Yang

PhD student at Cornell CS. Previously completed his Bachelor's degrees in Mathematics and Statistics at Carleton College. He is a winner of the CRA Outstanding Undergraduate Researcher Award and his research focuses on principled, controllable, and efficient generative models.

Upcoming Session

January 19, 2026

TiDAR: Think in Diffusion, Talk in Autoregression

Jingyu Liu will discuss TiDAR, a hybrid decoding approach that combines diffusion-style parallel drafting with autoregressive verification for high quality and high throughput.

Time: Jan 19 (Monday) · 1 PM ET / 10 AM PT / 7 PM CET / 11:30 PM IST

Meeting link: click here

Paper: TiDAR

Abstract: TiDAR is a hybrid language model that drafts tokens in parallel using diffusion, then verifies them autoregressively, to match AR quality while generating much faster.

Latest Sessions

View All Sessions
S5 | Esoteric Language Models
00:55:06

S5 | Esoteric Language Models

In this talk, Zhihan Yang presents Eso-LMs, which unifies AR and diffusion language models. Eso-LMs enable exact likelihoods and KV caching while preserving parallel generation.

January 12, 2026 Click to read full abstract
S4 | DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
1:00:19

S4 | DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation

In this talk, Shansan Gong will present DiffuCoder and discuss how diffusion language models enable global planning and iterative refinement for code generation.

December 22, 2025 Click to read full abstract
S3 | OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows
1:03:02

S3 | OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows

In this talk, John Nguyen presents OneFlow, a non-autoregressive multimodal model for concurrent text and image generation.

December 15, 2025 Click to read full abstract

Latest Relevant Videos

View All Videos
But How Do Diffusion Language Models Actually Work?
12:27

But How Do Diffusion Language Models Actually Work?

Jia-Bin Huang explores several ideas for applying diffusion models to language modeling

August 3, 2025 Click to read description
Simple Guidance Mechanisms for Discrete Diffusion Models
7:00

Simple Guidance Mechanisms for Discrete Diffusion Models

Simple Guidance Mechanisms for Discrete Diffusion Models (ICLR 2025 video)

April 15, 2025 Click to read description
Simple Diffusion Language Models
15:07

Simple Diffusion Language Models

Quick introduction to Masked Diffusion Language Models (MDLM) by Alexander Rush

July 3, 2024 Click to read description