17 Image generation
This chapter covers
- Variational autoencoders
- Diffusion models
- Using a pretrained text-to-image model
- Exploring the image latent spaces learned by text-to-image models
17.1 Deep learning for image generation
The most popular and successful application of creative AI today is image generation: learning latent visual spaces and sampling from them to create entirely new pictures, interpolated from real ones – pictures of imaginary people, imaginary places, imaginary cats and dogs, and so on.
In this section and the next, we’ll review some high-level concepts pertaining to image generation, alongside implementation details relative to two of the main techniques in this domain: variational autoencoders (VAEs) and diffusion models. Do note that the techniques we present here aren’t specific to images – you could develop latent spaces of sound or music using similar models – but in practice, the most interesting results so far have been obtained with pictures, and that’s what we focus on here.