8 Training Neural Networks: Forward and Backpropagation

Neural networks make complicated real life decisions by modeling the decision making process with mathematical functions. These functions can become arbitrarily involved, but fortunately we have a simple building block called perceptron which can be repeated in a systematic (layered) fashion to model any arbitrary function. Indeed we need not even explicitly know the function being modeled in closed form. All we need is a reasonable sized set of inputs and corresponding correct outputs. This collection of input, output pairs is known as training data. Armed with this training data, we can train a MLP (Multi Layered Perceptron aka neural network) so that it will emit a reasonably correct output on inputs it has never seen before.

Such neural networks, where one needs to know the output for each input in the training dataset, are known as supervised neural networks. The correct output for the training inputs is typically generated via a manual process called labeling. Labeling is expensive and time consuming. Much research is going on towards unsupervised,semi-supervised and self-supervised networks which eliminate or minimize the process of labeling. But, as of now, accuracies of unsupervised or self-supervised networks in general do not match that of supervised networks. In this chapter, we will focus on supervised neural networks.Unsupervised and self supervised networks will be picked up in later chapters.

8.1 Differentiable step like functions

8.1.1 Sigmoid Function

8.1.2 TanH Function

8.2 Why Layering

8.3 Linear Layer

8.3.1 Linear layer expressed as a matrix-vector multiplication

8.3.2 Forward Propagation and Grand Output function for an MLP of Linear Layers

8.4 Training and Backpropagation

8.4.1 Loss and its Minimization: Goal of Training

8.4.2 Loss Surface and Gradient Descent

8.4.3 Why Gradient provides the best direction for descent

8.4.4 Gradient Descent and Local Minima

8.4.5 The backpropagation algorithm

8.4.6 Putting it all together: Overall Training Algorithm

8.5 Training a Neural Network in PyTorch

8.6 Chapter Summary