4 Fundamentals of Training Deep Neural Networks
This chapter covers
- The functional approximating power of perceptron network using ReLU activation function
- Introduction of the auto-differentiation mechanism used by PyTorch
- Step by step implementation of a convolutional neural network to predict digits from images using MNIST dataset
- Experimenting with different levels of neural network complexity to observe the impact in training and test set performance
- Continued discussion on the generalization property of stochastic gradient descent algorithm
So far, we have been dealing with the approximating power of the model to the true underlying function by adding more features and controlling the model complexity. For example, in the previous chapter, we learned that the polynomial function's approximation capacity increases when more higher-degree polynomial terms are present. Such model fitting process essentially amounts to functional approximation, where we approximate a complex function using a combination of simpler ones. Our goal is thus to build a good machine learning model that is expressive enough to approximate the true underlying function.