chapter five

5 Generate images with diffusion models

This chapter covers

How the forward diffusion process gradually adds noise to images
How the reverse diffusion process iteratively removes noise to create a clean image
Training a denoising U-Net model from sratch
Using the trained model to generate new clothing-item images

Text-to-image generation has seen remarkable progress in recent years, largely thanks to two classes of models: vision transformers (ViTs) and diffusion models. In this chapter, we focus on the second approach, diffusion-based generative models, which have quickly become the gold standard for state-of-the-art high-resolution image generation.

At their core, diffusion models create images through a two-step process. First, they learn to gradually add random noise to clean images, step by step, until the images become pure noise. This is called the forward diffusion process. Then, the models are trained to reverse the process: starting with pure noise, a diffusion model learns to iteratively remove noise, guided by learned patterns, until a new, clean image emerges. By controlling each small denoising step, diffusion models can generate high-resolution images that surpass the quality of images generated by other approaches.

5.1 The forward diffusion process

5.1.1 How diffusion models work

5.1.2 Visualize the forward diffusion process

5.1.3 Different diffusion schedules

5 Generate images with diffusion models

This chapter covers

5.1 The forward diffusion process

5.1.1 How diffusion models work

5.1.2 Visualize the forward diffusion process

5.1.3 Different diffusion schedules

5.2 The reverse diffusion process

5.3 A blueprint to train the U-Net model

5.3.1 Steps in training a denoising U-Net model

5.3.2 Preprocess the training data

5.4 Train and use the diffusion model

5.4.1 The DDPM noise scheduler

5.4.2 Inference using the U-Net denoising model

5.4.3 Train and use the denoising U-Net model

5.5 Summary