5 Modern training techniques
This chapter covers
- Improving long-term training using a learning rate schedule
- Improving short-term training using optimizers
- Combining learning rate schedules and optimizers to improve any deep model’s results
- Tuning network hyperparameters with Optuna
At this point, we have learned the basics of neural networks and three types of architectures: fully connected, convolutional, and recurrent. These networks have been trained with an approach called stochastic gradient descent (SGD), which has been in use since at least the 1960s. Newer improvements to learning the parameters of our network have been invented since then, like momentum and learning rate decay, which can improve any neural network for any problem by converging to better solutions in fewer updates. In this chapter, we learn about some of the most successful and widely used variants of SGD in deep learning.