chapter seven

7 Generate high-resolution images with diffusion models

This chapter covers

The denoising diffusion implicit model (DDIM) noise scheduler
Adding the attention mechanism in denoising U-Net models
Generating high-resolution images with diffusion models
Interpolating initial noise tensors to generate a series of images that smoothly transition from one image to another

In the previous two chapters, you explored the fundamentals of diffusion models. You learned how the forward diffusion process gradually adds noise to clean images until they are transformed into pure noise. In the reverse diffusion, the trained model reconstructs images by progressively removing noise. A key component of this process is the denoising U-Net model, which learns to remove noise from noisy images until the final output is indistinguishable from real images in the training dataset.

Once the model is trained, you can start with a random noise tensor and use the U-Net to iteratively remove noise, eventually generating a clear and meaningful image. Additionally, diffusion models can be conditioned on specific information—such as labels or text—to guide the image generation process. This conditional generation is crucial because modern text-to-image models rely on conditioning mechanisms to create images based on text prompts.

7.1 Attention in U-Net, DDIM, and image interpolation

7.1.1 Incorporate the attention mechanism in the U-Net

7.1.2 Denoising diffusion implicit models (DDIM)

7.1.3 Image interpolation in diffusion models

7.2 High-resolution flower images as training data

7.2.1 Visualize Images in the training dataset

7.2.2 Forward diffusion on flower images

7.3 Build and train a U-Net for high-resolution images

7.3.1 Build the denoising U-Net model

7.3.2 Train the denoising U-Net model

7.4 Image generation and interpolation

7.4.1 Use the trained denoising U-Net to generate images

7.4.2 Transition from one image to another

7.5 Summary