10 Interpreting what convnets learn

This chapter covers

Interpreting how convnets decompose an input image
Visualizing the filters learned by convnets
Visualizing areas in an image responsible for a certain classification decision

A fundamental problem when building a computer vision application is that of interpretability: why did your classifier think a particular image contained a fridge, when all you can see is a truck? This is especially relevant to use cases where deep learning is used in complement to human expertise, such as medical imaging use cases. We will end this chapter by getting you familiar with a range of different techniques for visualizing what convnets learn and understanding the decisions they make.

It’s often said that deep-learning models are “black boxes”: they learn representations that are difficult to extract and present in a human-readable form. Although this is partially true for certain types of deep-learning models, it’s definitely not true for convnets. The representations learned by convnets are highly amenable to visualization, in large part because they’re representations of visual concepts. Since 2013, a wide array of techniques has been developed for visualizing and interpreting these representations. We won’t survey all of them, but we’ll cover four of the most accessible and useful ones:

10.1 Visualizing intermediate activations

10.2 Visualizing convnet filters

10.3 Gradient ascent in TensorFlow

10.4 Gradient ascent in PyTorch

10.5 Gradient ascent in JAX

10.6 The filter visualization loop

10.7 Visualizing heatmaps of class activation

10.7.1 Getting the gradient of the top class: TensorFlow version

10.7.2 Getting the gradient of the top class: PyTorch version

10.7.3 Getting the gradient of the top class: JAX version

10.7.4 Displaying the class activation heatmap

10.8 Chapter summary