chapter six

6 Understanding layers and units

This chapter covers

Dissecting a black-box convolutional neural network to understand the features or concepts that are learned by the layers and units
Running the network dissection framework
Quantifying the interpretability of layers and units in the convolutional neural network and how to visualize them
Strengths and weaknesses of the network dissection framework

In chapters 3, 4, and 5, we focused our attention on black-box models and how to interpret them using various techniques such as partial dependence plots (PDPs), LIME, SHAP, anchors, and saliency maps. In chapter 5, we specifically focused on convolutional neural networks (CNNs) and visual attribution methods such as gradients and activation maps that highlight the salient features that the model is focusing on. All these techniques focused on interpreting the complex processing and operations that happen within a black-box model by reducing its complexity. PDPs, for instance, are model-agnostic and show the marginal or average global effects of feature values on model prediction. Techniques like LIME, SHAP, and anchors are also model-agnostic—they create a proxy model that behaves similarly to the original black-box model but is simpler and easier to interpret. Visual attribution methods and saliency maps are weakly model-dependent and help highlight a small portion of the input that is salient, or important, for the model.

6.1 Visual understanding

6.2 Convolutional neural networks: A recap

6.3 Network dissection framework

6.3.1 Concept definition

6.3.2 Network probing

6.3.3 Quantifying alignment

6.4 Interpreting layers and units

6.4.1 Running network dissection

6.4.2 Concept detectors

6.4.3 Concept detectors by training task

6.4.4 Visualizing concept detectors

6.4.5 Limitations of network dissection

Summary