10 One, Two and Three Dimensional Convolution and Transposed Convolution in Neural Networks
Image analysis typically involves identification of local patterns. For instance, if one wants to do face recognition, one needs to analyze local patterns of neighboring pixels corresponding to eyes, noses and ears. The subject of the photograph maybe standing on a beach in front of the ocean. The big picture involving sand and water is irrelevant.
Convolution is a specialized operation that examines local patterns in an input signal. These operators are typically used to analyze images, i.e., the input is a 2D array of pixels. To illustrate this, we will study a few examples of special purpose convolution operations that respectively detect edges, corners, the average illumination in a small neighborhood of pixels, from an image. Once we have detected such local properties, we can combine them and recognize higher level patterns like ears, noses and eyes. Those we can combine, in turn, to detect still higher level structures like faces. The system naturally lends itself to multi-layer convolutional neural networks - the lowest layers(closest to the input) detect edges and corners, the next layers detect ears, eyes, noses and so forth.