10 Hyperparameter tuning

 

This chapter covers

  • Initializing the weights in a model prior to warm-up training
  • Doing hyperparameter search manually and automatically
  • Constructing a learning rate scheduler for training a model
  • Regularizing a model during training

Hyperparameter tuning is the process of finding the optimal settings of the training hyperparameters, so that we minimize the training time and maximize the test accuracy. Usually, these two objectives can’t be fully optimized. If we minimize the training time, we likely will not achieve the best accuracy. Likewise, if we maximize the test accuracy, we likely will need longer to train.

Tuning is finding the combination of hyperparameter settings that meet your targets for the objectives. For example, if your target is the highest possible accuracy, you may not concern yourself with minimizing the training time. In another situation, if you need only good (but not the best) accuracy, and you are continuously retraining, you may want to find settings that get this good accuracy while minimizing the training time.

Generally, an objective has no specific set of settings. More likely, within the search space various sets of settings will achieve your objective. You need to find only one of those sets—and that’s what tuning is.

10.1 Weight initialization

10.1.1 Weight distributions

10.1.2 Lottery hypothesis

10.1.3 Warm-up (numerical stability)

10.2 Hyperparameter search fundamentals

10.2.1 Manual method for hyperparameter search

10.2.2 Grid search

10.2.3 Random search

10.2.4 KerasTuner

10.3 Learning rate scheduler

10.3.1 Keras decay parameter

10.3.2 Keras learning rate scheduler

10.3.3 Ramp