concept hyperparameter optimization in category deep learning

appears as: hyperparameter optimization
Deep Learning with JavaScript: Neural networks in TensorFlow.js

This is an excerpt from Manning's book Deep Learning with JavaScript: Neural networks in TensorFlow.js.

  • Hyperparameters are configurations concerning a machine-learning model’s structure, properties of its layer, and its training process. They are distinct from the model’s weight parameters in that 1) they do not change during the model’s training process, and 2) they are often discrete. Hyperparameter optimization is the process in which values of the hyperparameters are sought in order to minimize a loss on the validation dataset. Hyperparameter optimization is still an active area of research. Currently, the most frequently used methods include grid search, random search, and Bayesian methods.
  • The process of selecting good hyperparameter values is referred to as hyperparameter optimization or hyperparameter tuning. The goal of hyperparameter optimization is to find a set of parameters that leads to the lowest validation loss after training. Unfortunately, there is currently no definitive algorithm that can determine the best hyperparameters given a dataset and the machine-learning task involved. The difficulty lies in the fact that many of the hyperparameters are discrete, so the validation loss value is not differentiable with respect to them. For example, the number of units in a dense layer and the number of dense layers in a model are integers; the type of optimizer is a categorical parameter. Even for the hyperparameters that are continuous and against which the validation loss is differentiable (for example, regularization factors), it is usually too computationally expensive to keep track of the gradients with respect to those hyperparameters during training, so it is not really feasible to perform gradient descent in the space of such hyperparameters. Hyperparameter optimization remains an active area of research, one which deep-learning practitioners should pay attention to.

    Given the lack of a standard, out-of-the-box methodology or tool for hyperparameter optimization, deep-learning practitioners often use the following three approaches. First, if the problem at hand is similar to a well-studied problem (say, any of the examples you can find in this book), you can start with applying a similar model on your problem and “inherit” the hyperparameters. Later, you can search in a relatively small hyperparameter space around that starting point.

    sitemap

    Unable to load book!

    The book could not be loaded.

    (try again in a couple of minutes)

    manning.com homepage
    test yourself with a liveTest