4 Training Fundamentals
This chapter covers
- Forward feeding and backward propagation
- Splitting datasets and data pre-processing
- Using validation data to monitor overfitting Using checkpointing and early stop for more economical training
- Hyperparameters vs model parameters
- Training for invariance to location and scale
- Assembling and accessing on-disk datasets
- Saving and then restoring a trained model
In this chapter, we will cover the fundamentals of training a model. Prior to 2019, the majority of models were trained according to a set of fundamental steps which we will cover in this chapter. Consider this chapter as a foundation.
In this chapter we cover methods, techniques and best practices developed over time by experimentation and trial and error. We will start first reviewing forward feeding and backward propagation. While the concept and practice pre-existed deep learning, it took numerous refinements over the years to make model training practical; specifically in how we split the data, how we feed it and then how we update weights using gradient descent during backward propagation. These refinement in techniques provided the means to train models to convergence, where the accuracy of the model to predict would plateau. Other training techniques in data preprocessing and augmentation were developed to push convergence to higher plateaus, and aid models into better generalizing to data that the model was not trained on.