10 Diffusion models for images

 

This chapter covers

  • A walkthrough of generative AI used in image synthesis
  • The foundational principles of diffusion models
  • Implementing a diffusion model from scratch

Continuing our journey with generative AI, we now turn our attention to image synthesis. In the previous chapter, we explored how transformers can be used for text generation. In this chapter, we will delve into the realm of image generation using deep learning techniques.

At first glance, the task of generating images appears considerably different from generating text. Images consist of pixels, each defined by specific color values, whereas text is composed of a sequence of words, or tokens. In text generation, the relationships between elements are primarily sequential and largely determined by the order of tokens. In contrast, image generation involves spatial dependencies, where context is influenced not only by the proximity of elements but also by their arrangement across the entire image. Applying the methods we learned to probabilistically predict next tokens using previous tokens does not intuitively seem feasible for generating images.

10.1 History of VAEs and GANs

10.2 Motivator for diffusion models

10.3 Diffusion in detail

10.4 Setting up the data

10.5 The forward process

10.6 Training

10.6.1 Loss

10.7 Reversing diffusion (how to sample)

10.8 Conclusion

10.9 Exercises

10.10 Summary