chapter ten

10 Hyperparameter Tuning

This chapter covers

Initializing the weights in a model prior to training warmup training
Doing hyperparameter search manually and automatically
Constructing a learning rate scheduler for training a model
Regularizing a model during training

Hyperparameter tuning is the process of finding the optimal settings of the training hyperparameters,so that we minimize the training time and maximize the test accuracy.

Usually these two objectives can’t be fully optimized. That is, if we minimize the training time we likely will not achieve the best accuracy. Likewise, if we maximize the test accuracy we likely will need longer to train.

Tuning is finding the combination of hyperparameter settings that meet your targets for the objectives. For example, if your target is the highest possible accuracy, you may not concern yourself with minimizing the training time. In another situation, if you only need good (but not best) accuracy, and you are continuously retraining, you may want to find settings that get this good accuracy while minimizing the training time.

10.1 Weight Initialization

10.1.1 Weight Distributions

10.1.2 Lottery Hypothesis

10.1.3 Warmup (Numerical Stability)

10.2 Hyperparameter search fundamentals

10.2.1 Manual method for hyperparameter search

10.2.2 Grid Search

10.2.3 Random Search

10.2.4 KerasTuner

10.3 Learning Rate Scheduler

10.3.1 Keras Decay Parameter

10.3.2 Keras Learning Rate Scheduler

10.3.3 Ramp

10.3.4 Constant Step

10.3.5 Cosine Annealing

10.4 Regularization

10.4.1 Weight Regularization

10.4.2 Label Smoothing

10.5 Summary