chapter five

5 Generate images with diffusion models

This chapter covers

How the forward diffusion process gradually adds noise to images
How the reverse diffusion process iteratively removes noise to create a clean image
Training a denoising U-Net model
Using the trained U-Net to generate clothing-item images

This book focuses on two main ways of text-to-image generation. The first way is through vision transformers (ViTs), and the second is through diffusion. In diffusion-based text-to-image generation, we start with an image with pure noise. We ask the trained diffusion model to denoise it slightly, conditional on the text prompt. The result is a less noisy image, which is again fed to the diffusion model to remove the noise. We repeat the process many times and the output is a clean image that matches the text prompt. Diffusion models have become the go-to generative models because they produce higher-quality, more detailed images thanks to their iterative denoising process.

5.1 The forward diffusion process

5.1.1 How diffusion models Work?

5.1.2 Visualize the forward diffusion process

5.1.3 Different diffusion schedules

5.2 The reverse diffusion process

5.3 A blueprint to train the U-Net model

5.3.1 Steps in training a denoising U-Net model

5.3.2 Preprocess the training data

5.4 Train and use the diffusion model

5.4.1 The DDPM noise scheduler

5.4.2 Inference using the U-Net denoising model

5.4.3 Train and use the denoising U-Net model

5.5 Summary