Shidong Cao

Diffusion models gradually transform noise into structured samples through a learned reverse process.

Reverse Process

A common parameterization learns a denoising network:

\[\epsilon_\theta(x_t, t)\]

The sampling step estimates a cleaner state from a noisy state:

\[x_{t-1} \leftarrow g(x_t, \epsilon_\theta(x_t, t), t)\]

Fewer sampling steps improve latency but may reduce fidelity.
Classifier-free guidance changes the balance between diversity and prompt adherence.
For reasoning tasks, the denoising analogy can be used at the level of intermediate states rather than pixels.