5 Modern Training Techniques
This chapter covers
- Improving “long term” training using a learning rate schedule.
- Improving “short term” training using different optimizers.
- Combining learning rate schedules and optimizers to improve deep model’s results.
- Tuning your network’s hyper parameters with Optuna.
At this point we have learned the basics of neural networks and three different types of architectures: fully-connected, convolutional, and recurrent. All of these networks have been trained with an approach called stochastic gradient descent (SGD), which has been in use since the 1960s and even earlier. Newer approaches to learning the parameters of our network have been invented since then, and we can improve any and all neural networks for any problem by using these newer techniques.