8 Using convolutions to generalize

This chapter covers

Understanding convolution
Building a convolutional neural network
Creating custom nn.Module subclasses
The difference between the module and functional APIs
Design choices for neural networks

In the previous chapter, we built a simple neural network that could fit (or overfit) the data, thanks to the many parameters available for optimization in the linear layers. We had issues with our model, however, in that it was better at memorizing the training set than it was at generalizing properties of birds and airplanes. Based on our model architecture, we’ve got a guess as to why that’s the case. Due to the fully connected setup needed to detect the various possible translations of the bird or airplane in the image, we have both too many parameters (making it easier for the model to memorize the training set) and no position independence (making it harder to generalize). As we discussed in the last chapter, we could augment our training data by using a wide variety of recropped images to try to force generalization, but that won’t address the issue of having too many parameters.

There is a better way! It consists of replacing the dense, fully connected affine transformation in our neural network unit with a different linear operation: convolution.

8.1 The case for convolutions

8.1.1 What convolutions do

8.2 Convolutions in action

8.2.1 Padding the boundary

8.2.2 Detecting features with convolutions

8.2.3 Looking further with depth and pooling

8.2.4 Putting it all together for our network

8.3 Subclassing nn.Module

8.3.1 Our network as an nn.Module

8.3.2 How PyTorch keeps track of parameters and submodules

8.3.3 The functional API

8.4 Training our convnet

8.4.1 Measuring accuracy

8.4.2 Saving and loading our model