8 Generative adversarial networks (GANs)

This chapter covers

Understanding the basic components of GANs: generative and discriminative models
Evaluating generative models
Learning about popular vision applications of GANs
Building a GAN model

Generative adversarial networks (GANs) are a new type of neural architecture introduced by Ian Goodfellow and other researchers at the University of Montreal, including Yoshua Bengio, in 2014.1 GANs have been called “the most interesting idea in the last 10 years in ML” by Yann LeCun, Facebook’s AI research director. The excitement is well justified. The most notable feature of GANs is their capacity to create hyperrealistic images, videos, music, and text. For example, except for the far-right column, none of the faces shown on the right side of figure 8.1 belong to real humans; they are all fake. The same is true for the handwritten digits on the left side of the figure. This shows a GAN’s ability to learn features from the training images and imagine its own new images using the patterns it has learned.

Figure 8.1 Illustration of GANs’ abilities by Goodfellow and co-authors. These are samples generated by GANs after training on two datasets: MNIST and the Toronto Faces Dataset (TFD). In both cases, the right-most column contains true data. This shows that the produced data is really generated and not only memorized by the network. (Source: Goodfellow et al., 2014.)

8.1 GAN architecture

8.1.1 Deep convolutional GANs (DCGANs)

8.1.2 The discriminator model

8.1.3 The generator model

8.1.4 Training the GAN

8.1.5 GAN minimax function

8.2 Evaluating GAN models

8.2.1 Inception score

8.2.2 Fréchet inception distance (FID)

8.2.3 Which evaluation scheme to use

8.3 Popular GAN applications

8.3.1 Text-to-photo synthesis

8.3.2 Image-to-image translation (Pix2Pix GAN)

8.3.3 Image super-resolution GAN (SRGAN)

8.3.4 Ready to get your hands dirty?

8.4 Project: Building your own GAN