chapter eight

8 Training neural networks: Forward propagation and backpropagation

This chapter covers

Sigmoid functions as differential surrogates for Heaviside step functions
Layering in neural networks: expressing linear layers as matrix-vector multiplication
Regression loss, forward and backward propagation, and their math

So far, we have seen that neural networks make complicated real-life decisions by modeling the decision-making process with mathematical functions. These functions can become arbitrarily involved, but fortunately, we have a simple building block called a perceptron that can be repeated systematically to model any arbitrary function. We need not even explicitly know the function being modeled in closed form. All we need is a reasonably sized set of sample inputs and corresponding correct outputs. This collection of input and output pairs is known as training data. Armed with this training data, we can train a multilayer perceptron (MLP, aka neural network) to emit reasonably correct outputs on inputs it has never seen before.

8.1 Differentiable step-like functions

8.1.1 Sigmoid function

8.1.2 Tanh function

8.2 Why layering?

8.3 Linear layers

8.3.1 Linear layers expressed as matrix-vector multiplication

8.3.2 Forward propagation and grand output functions for an MLP of linear layers

8.4 Training and backpropagation

8.4.1 Loss and its minimization: Goal of training

8.4.2 Loss surface and gradient descent

8.4.3 Why a gradient provides the best direction for descent

8.4.4 Gradient descent and local minima

8.4.5 The backpropagation algorithm

8.4.6 Putting it all together: Overall training algorithm

8.5 Training a neural network in PyTorch

Summary