Diagnosing and Improving Diffusion Models by Estimating the Optimal Loss Value
Diffusion models have revolutionized generative modeling, but their training dynamics remain somewhat opaque. A key challenge is that the loss function in diffusion models doesn't converge to zero even with perfect training - it approaches an unknown optimal value that depends on the dataset and diffusion process. This makes it difficult to assess whether a model has converged or simply lacks capacity.
In a recent arXiv paper, researchers from Peking University and Microsoft Research propose a solution: estimating this optimal loss value. They derive a closed-form expression for the optimal loss under a unified formulation of diffusion models and develop scalable estimators for practical use. Their approach unlocks several valuable applications:
- Training Diagnosis: By comparing actual training loss to the estimated optimal loss, practitioners can better understand model performance and identify underfitting regions.
- Improved Training Schedules: The researchers design a principled training schedule based on the gap between actual and optimal loss, achieving FID improvements of 2%-25% across different datasets and model architectures.
- Better Scaling Laws: They show that accounting for the optimal loss leads to more accurate neural scaling laws for diffusion models, with correlation coefficients improving from 0.82 to 0.94 in some cases.
The work provides concrete tools for practitioners while advancing our theoretical understanding of diffusion model training. The scalable estimators make it practical to apply these insights even to large datasets, opening new possibilities for optimizing diffusion model training and architecture design.