Diffusion models gradually transform noise into structured samples through a learned reverse process.
Reverse Process
A common parameterization learns a denoising network:
\[\epsilon_\theta(x_t, t)\]The sampling step estimates a cleaner state from a noisy state:
\[x_{t-1} \leftarrow g(x_t, \epsilon_\theta(x_t, t), t)\]Practical Notes
- Fewer sampling steps improve latency but may reduce fidelity.
- Classifier-free guidance changes the balance between diversity and prompt adherence.
- For reasoning tasks, the denoising analogy can be used at the level of intermediate states rather than pixels.