6 Control what images to generate in diffusion models
This chapter covers
- Creating and training a conditional diffusion model to generate images
- Building a denoising U-Net from scratch
- Coding diffusion processes in a Denoising Diffusion Probabilistic Model
- Injecting labeling information into the U-Net model for controlled generation
- Implementing classifier-free guidance
In the previous chapter, you learned how diffusion models generate images by gradually transforming random noise into clothing-item images, such as coats, bags, and sandals, using a denoising U-Net. However, the model could only generate images randomly from among 10 classes. A natural next question arises: Can we direct the model to create a specific image—a sandal, a t-shirt, or a coat—on demand? This chapter shows you how.
Here, you’ll learn conditional diffusion models that let you specify what you want to generate by conditioning on label information. The conditioning isn’t just limited to image classes; later in the book, we’ll extend the idea to text-to-image generation, where the conditioning is an open-ended text prompt. Mastering conditioning is therefore essential groundwork for the more advanced generative models we’ll tackle in later chapters.