10 One, Two and Three Dimensional Convolution and Transposed Convolution in Neural Networks

Image analysis typically involves identification of local patterns. For instance, if one wants to do face recognition, one needs to analyze local patterns of neighboring pixels corresponding to eyes, noses and ears. The subject of the photograph maybe standing on a beach in front of the ocean. The big picture involving sand and water is irrelevant.

Convolution is a specialized operation that examines local patterns in an input signal. These operators are typically used to analyze images, i.e., the input is a 2D array of pixels. To illustrate this, we will study a few examples of special purpose convolution operations that respectively detect edges, corners, the average illumination in a small neighborhood of pixels, from an image. Once we have detected such local properties, we can combine them and recognize higher level patterns like ears, noses and eyes. Those we can combine, in turn, to detect still higher level structures like faces. The system naturally lends itself to multi-layer convolutional neural networks - the lowest layers(closest to the input) detect edges and corners, the next layers detect ears, eyes, noses and so forth.

10.1 One Dimensional Convolution: Graphical and Algebraical view

10.1.1 Curve Smoothing via 1D Convolution

10.1.2 Curve Edge Detection via 1D Convolution

10.1.3 One Dimensional Convolution as Matrix Multiplication

10.1.4 PyTorch: One-dimensional convolution with custom weights

10.2 Convolution Output Size

10.3 Two Dimensional Convolution: Graphical and Algebraic view

10.3.1 Image Smoothing via 2D Convolution

10.3.2 Image Edge Detection via 2D Convolution

10.3.3 PyTorch: Two-dimensional convolution with custom weights

10.3.4 Two Dimensional Convolution as Matrix Multiplication

10.4 Three Dimensional Convolution

10.4.1 Video Motion Detection via 3D Convolution

10.4.2 PyTorch: Three-dimensional convolution with custom weights

10.5 Transposed Convolution or Fractionally Strided Convolution

10.5.1 Application of Transposed convolution: AutoEncoders and Embeddings

10.5.2 Transposed Convolution Output Size

10.5.3 Upsampling via Transpose Conv

10.6 Convolution Layers to a Neural Network

10.6.1 PyTorch: Adding Convolution Layers to a Neural Network

10.7 Pooling

10.8 Chapter Summary