17 Image generation

 

This chapter covers

  • Variational autoencoders
  • Diffusion models
  • Using a pretrained text-to-image model
  • Exploring the image latent spaces learned by text-to-image models

17.1 Deep learning for image generation

The most popular and successful application of creative AI today is image generation: learning latent visual spaces and sampling from them to create entirely new pictures, interpolated from real ones – pictures of imaginary people, imaginary places, imaginary cats and dogs, and so on.

In this section and the next, we’ll review some high-level concepts pertaining to image generation, alongside implementation details relative to two of the main techniques in this domain: variational autoencoders (VAEs) and diffusion models. Do note that the techniques we present here aren’t specific to images – you could develop latent spaces of sound or music using similar models – but in practice, the most interesting results so far have been obtained with pictures, and that’s what we focus on here.

17.1.1 Sampling from latent spaces of images

17.1.2 Variational autoencoders

17.1.3 Implementing a VAE with Keras

17.2 Diffusion models

17.2.1 The Oxford Flowers dataset

17.2.2 A U-Net denoising autoencoder

17.2.3 The concept of “diffusion time” and “diffusion schedule”

17.2.4 The training process

17.2.5 The generation process

17.2.6 Visualizing results with a custom callback

17.2.7 It’s go time!

17.3 Text-to-image models

17.4 Exploring the latent space of a text-to-image model

17.5 Chapter summary