9 Loss, Optimization and Regularization
By now, it should be etched in the readers’ mind that neural networks are essentially function approximators. In particular, neural network classifiers model the decision boundaries between the classes in the feature space (the space where every input feature combination is a specific point). Supervised classifiers collect sample training data points in this space with known class label. The training process iteratively learns a function that essentially creates decision boundaries separating only the sampled training data points. If the training data set is a reasonable representative of the true classes, the network (i.e., the learnt function which models the class boundaries) will classify never seen before inputs with good accuracy.
When we select a specific neural network architecture (with fixed set of of layers, each with a fixed set of perceptrons with specific connections), we have essentially frozen the family of functions which we will use as function approximator. We still have to ”learn” the exact weights of the connectors between various perceptrons (sometimes called neurons). The training process iteratively sets these weights - so as to best classify the training data points. This is done by designing a loss function which measures the departure of the network output from the desired result. The network continually tries to minimize this loss. There exists a variety of loss functions to choose from.