chapter six

6 Control what images to generate in diffusion models

 

This chapter covers

  • Creating and training a conditional diffusion model to generate images
  • Building a denoising U-Net from scratch
  • Coding diffusion processes in a Denoising Diffusion Probabilistic Model
  • Injecting labeling information into the U-Net model for controlled generation
  • Implementing classifier-free guidance

In the previous chapter, you learned how diffusion models generate images by gradually transforming random noise into clothing-item images, such as coats, bags, and sandals, using a denoising U-Net. However, the model could only generate images randomly from among 10 classes. A natural next question arises: Can we direct the model to create a specific image—a sandal, a t-shirt, or a coat—on demand? This chapter shows you how.

Here, you’ll learn conditional diffusion models that let you specify what you want to generate by conditioning on label information. The conditioning isn’t just limited to image classes; later in the book, we’ll extend the idea to text-to-image generation, where the conditioning is an open-ended text prompt. Mastering conditioning is therefore essential groundwork for the more advanced generative models we’ll tackle in later chapters.

6.1 Classifier-free guidance in diffusion models

6.1.1 An overview of classifier-free guidance

6.1.2 A blueprint to implement CFG

6.2 Different components of a denoising U-Net model

6.2.1 Time step embedding and label embedding

6.2.2 The U-Net denoising model architecture

6.2.3 Down blocks and up blocks in the U-Net

6.3 Building and training the denoising U-Net model

6.3.1 Building the denoising U-Net

6.3.2 The Denoising Diffusion Probabilistic Model

6.3.3 Training the diffusion model

6.4 Generating images with the trained diffusion model

6.4.1 Visualizing generated images

6.4.2 How the guidance parameter affects generated images

Summary