chapter five
5 Modern Training Techniques
This chapter covers
- Improving “long term” training using a learning rate schedule.
- Improving “short term” training using different optimizers.
- Combining learning rate schedules and optimizers to improve deep model’s results.
- Tuning your network’s hyper parameters with Optuna.
At this point we have learned the basics of neural networks and three different types of architectures: fully-connected, convolutional, and recurrent. All of these networks have been trained with an approach called stochastic gradient descent (SGD), which has been in use since the 1960s and even earlier. Newer improvements to learning the parameters of our network have been invented since then, like momentum and learning rate decay, can improve any and all neural networks for any problem by converging to better solutions in fewer updates. In this chapter we will learn about some of the most successful and widely used variants of SGD in deep learning.