chapter six

6 Control what images to generate in diffusion models

This chapter covers

Creating and training a conditional diffusion model to generate images you want
Building a denoising U-Net from scratch, including class label and time-step conditioning
Coding diffusion processes in a Denoising Diffusion Probabilistic Model (DDPM)
Injecting labeling information into the U-Net model for controlled generation
Implementing classifier-free guidance, a breakthrough technique in conditional generation

In the previous chapter, you learned how diffusion models can generate images by gradually transforming random noise into clothing-item images, such as coats, bags, and sandals, using a denoising U-Net. However, the model could only generate images randomly from among ten classes. A natural next question arises: can we direct the model to create a specific image, like a sandal, a t-shirt, or a coat, on demand? This chapter shows you how.

Here, you’ll learn conditional diffusion models, which let you specify what you want to generate by conditioning on label information. The conditioning is not just limited to image classes; later in the book, we’ll extend the idea to text-to-image generation, where the conditioning is an open-ended text prompt. Mastering conditioning is therefore essential groundwork for the more advanced generative models we’ll tackle in later chapters.

6.1 Classifier-free guidance in diffusion models

6.1.1 An overview of classifier-free guidance

6.1.2 A blueprint to implement classifier-free guidance

6.2 Different components of a denoising U-Net model

6.2.1 Time step embedding and label embedding

6.2.2 The U-Net denoising model architecture

6.2.3 Down blocks and up blocks in the U-Net

6.3 Build and train the denoising U-Net model

6.3.1 Build the denoising U-Net

6.3.2 The denoising diffusion probabilistic model (DDPM)

6.3.3 Train the Diffusion model

6.4 Generate images with the trained diffusion model

6.4.1 Visualize generated images in different time steps

6.4.2 How the guidance parameter affects generated images

6.5 Summary