chapter three

3 Convolutional neural networks

This chapter covers

How tensors represent spatial data
Defining convolutions and their uses
Building and training a convolutional neural network (CNN)
Adding pooling to make CNNs more robust
Augmenting image data to improve accuracy

Convolutional neural networks (CNNs) revitalized the field of neural networks while simultaneously ushering in a new branding of deep learning starting in 2011 and 2012. CNNs are still at the heart of many of the most successful applications of deep learning, including self-driving cars, speech recognition systems used by smart devices, and optical character recognition. All of this stems from the fact that convolutions are powerful yet simple tools that help us encode information about the problem into the design of our network architecture. Instead of focusing on feature engineering, we spend more time engineering the architectures of our networks.

The success of convolutions comes from their ability to learn spatial patterns, which has made them the default method to use for any data resembling an image. When you apply a convolution to an image, you can learn to detect simple patterns like horizontal or vertical lines, changes in color, or grid patterns. When you stack convolutions in layers, they begin to recognize more complex patterns, building upon the simpler convolutions that came before.

3.1 Spatial structural prior beliefs

3.1.1 Loading MNIST with PyTorch

3.2 What are convolutions?

3.2.1 1D convolutions

3.2.2 2D convolutions

3.2.3 Padding

3.3 How convolutions benefit image processing

3.4 Putting it into practice: Our first CNN

3.4.1 Making a convolutional layer with multiple filters

3.4.2 Using multiple filters per layer

3.4.3 Mixing convolutional layers with linear layers via flattening

3.4.4 PyTorch code for our first CNN

3.5 Adding pooling to mitigate object movement

3.5.1 CNNs with max pooling

3.6 Data augmentation

Exercises

Summary