chapter seventeen

17 Image generation

This chapter covers

Variational autoencoders
Diffusion models
Using a pretrained text-to-image model
Exploring the latent image spaces learned by text-to-image models

The most popular and successful application of creative AI today is image generation: learning latent visual spaces and sampling from them to create entirely new pictures, interpolated from real ones—pictures of imaginary people, imaginary places, imaginary cats and dogs, and so on.

17.1 Deep learning for image generation

In this section and the next, we’ll review some high-level concepts pertaining to image generation, alongside implementation details relative to two of the main techniques in this domain: variational autoencoders (VAEs) and diffusion models. Do note that the techniques we present here aren’t specific to images—you could develop latent spaces of sound or music using similar models—but in practice, the most interesting results so far have been obtained with pictures, and that’s what we focus on here.

17.1.1 Sampling from latent spaces of images

17.1.2 Variational autoencoders

17.1.3 Implementing a VAE with Keras

17.2 Diffusion models

17.2.1 The Oxford Flowers dataset

17.2.2 A U-Net denoising autoencoder

17.2.3 The concepts of diffusion time and diffusion schedule

17.2.4 The training process

17.2.5 The generation process

17.2.6 Visualizing results with a custom callback

17.2.7 It’s go time!

17.3 Text-to-image models

17.3.1 Exploring the latent space of a text-to-image model

Summary