7 Generate high-resolution images with diffusion models

 

This chapter covers

  • The denoising diffusion implicit model (DDIM) noise scheduler
  • Adding the attention mechanism in denoising U-Net models
  • Generating high-resolution images with advanced diffusion models
  • Interpolating initial noise tensors to generate a series of images that smoothly transition from one image to another

In the previous two chapters, you built a foundational understanding of diffusion models, learning how they add noise to clean images, then reverse this process to generate new images from pure noise. By utilizing the powerful denoising U-Net architecture, you saw how a model can be trained to transform pure noise into grayscale clothing-item images, step by step.

But what does it take to move from simple grayscale images to richly detailed, high-resolution color images? And how can we make these models not only more accurate, but also faster and more efficient at generating such images? This chapter tackles these questions by introducing advanced tools and techniques that are now the backbone of state-of-the-art text-to-image generators.

7.1 Attention in U-Net, DDIM, and image interpolation

7.1.1 Incorporate the attention mechanism in the U-Net

7.1.2 Denoising diffusion implicit models (DDIM)

7.1.3 Image interpolation in diffusion models

7.2 High-resolution flower images as training data

7.2.1 Visualize Images in the training dataset

7.2.2 Forward diffusion on flower images

7.3 Build and train a U-Net for high-resolution images

7.3.1 Build the denoising U-Net model

7.3.2 Train the denoising U-Net model

7.4 Image generation and interpolation

7.4.1 Use the trained denoising U-Net to generate images

7.4.2 Transition from one image to another

7.5 Summary