6 Control what images to generate in diffusion models
This chapter covers
- Creating and training a conditional diffusion model to generate images you want
- Building a denoising U-Net from scratch, including class label and time-step conditioning
- Coding diffusion processes in a Denoising Diffusion Probabilistic Model (DDPM)
- Injecting labeling information into the U-Net model for controlled generation
- Implementing classifier-free guidance, a breakthrough technique in conditional generation
In the previous chapter, you learned how diffusion models can generate images by gradually transforming random noise into clothing-item images, such as coats, bags, and sandals, using a denoising U-Net. However, the model could only generate images randomly from among ten classes. A natural next question arises: can we direct the model to create a specific image, like a sandal, a t-shirt, or a coat, on demand? This chapter shows you how.
Here, you’ll learn conditional diffusion models, which let you specify what you want to generate by conditioning on label information. The conditioning is not just limited to image classes; later in the book, we’ll extend the idea to text-to-image generation, where the conditioning is an open-ended text prompt. Mastering conditioning is therefore essential groundwork for the more advanced generative models we’ll tackle in later chapters.