7 Generate high-resolution images with diffusion models
This chapter covers
- The denoising diffusion implicit model (DDIM) noise scheduler
- Adding the attention mechanism in denoising U-Net models
- Generating high-resolution images with diffusion models
- Interpolating initial noise tensors to generate a series of images that smoothly transition from one image to another
In the previous two chapters, you explored the fundamentals of diffusion models. You learned how the forward diffusion process gradually adds noise to clean images until they are transformed into pure noise. In the reverse diffusion, the trained model reconstructs images by progressively removing noise. A key component of this process is the denoising U-Net model, which learns to remove noise from noisy images until the final output is indistinguishable from real images in the training dataset.
Once the model is trained, you can start with a random noise tensor and use the U-Net to iteratively remove noise, eventually generating a clear and meaningful image. Additionally, diffusion models can be conditioned on specific information—such as labels or text—to guide the image generation process. This conditional generation is crucial because modern text-to-image models rely on conditioning mechanisms to create images based on text prompts.