chapter four

4 Fundamentals of Training Deep Neural Networks

This chapter covers

The functional approximating power of perceptron network using ReLU activation function
Introduction of the auto-differentiation mechanism used by PyTorch
Step by step implementation of a convolutional neural network to predict digits from images using MNIST dataset
Experimenting with different levels of neural network complexity to observe the impact in training and test set performance
Continued discussion on the generalization property of stochastic gradient descent algorithm

So far, we have been dealing with the approximating power of the model to the true underlying function by adding more features and controlling the model complexity. For example, in the previous chapter, we learned that the polynomial function's approximation capacity increases when more higher-degree polynomial terms are present. Such model fitting process essentially amounts to functional approximation, where we approximate a complex function using a combination of simpler ones. Our goal is thus to build a good machine learning model that is expressive enough to approximate the true underlying function.

4.1 Multilayer perceptron

4.1.1 A two-layer neural network

4.1.2 Shallow versus deep neural network

4.2 Automatic differentiation

4.2.1 Gradient-based optimization

4.2.2 The chain rule with partial derivatives

4.2.3 Different modes of multiplication

4.3 Training a simple CNN using MNIST

4.3.1 Download and loading MNIST

4.3.2 Defining the prediction function

4.3.3 Define the cost function

4.3.4 Define the optimization procedure

4.3.5 Update the weights via iterative training

4.4 More on generalization

4.4.1 Multiple global minima

4.4.2 Best versus worst global minimum

4.5 Summary