9 Loss, optimization, and regularization
This chapter covers
- Geometrical and algebraic introductions to loss functions
- Geometrical intuitions for softmax
- Optimization techniques including momentum, Nesterov, AdaGrad, Adam, and SGD
- Regularization and its relationship to Bayesian approaches
- Overfitting while training, and dropout
By now, it should be etched in your mind that neural networks are essentially function approximators. In particular, neural network classifiers model the decision boundaries between the classes in the feature space (a space where every input feature combination is a specific point). Supervised classifiers mark sample training data inputs in this space with a—perhaps manually generated—class label (ground truth). The training process iteratively learns a function that essentially creates decision boundaries separating the sampled training data points into individual classes. If the training data set is a reasonable representative of the true distribution of possible inputs, the network (the learned function that models the class boundaries) will classify never-before-seen inputs with good accuracy.