In the previous two chapters, we saw how models are trained: a training service manages training processes in a remote compute cluster with given model algorithms. But model algorithms and training services aren’t all there is to model training. There’s one more component we haven’t discussed yet—hyperparameter optimization (HPO). Data scientists often overlook the fact that hyperparameter choices can influence model training results significantly, especially when these decisions can be automated using engineering methods.
Hyperparameters are parameters whose value must be set before the model training process starts. Learning rate, batch size, and number of hidden layers are all examples of hyperparameters. Unlike the value of model parameters—weights and bias, for example—hyperparameters cannot be learned during the training process.
Research reveals that the chosen value of hyperparameters can affect both the quality of the model training as well as the time and memory requirements of the training algorithm. Therefore, hyperparameters must be tuned to be optimal for model training. Nowadays, HPO has become a standard step in the deep learning model development process.