4 Building loss functions with the likelihood approach
This chapter covers:
- Using the maximum likelihood principle for estimating model parameters
- Determining a loss function for classification problems
- Determining a loss function for regression problems
Deep Learning models have often millions of parameters which you need to determine during the training process. In chapter 3 you have seen how you can determine the parameter values via optimizing a loss function via stochastic gradient descent (SGD). But how did we arrive at the loss function? In the linear regression problem, we used the mean squared error as a loss function. We don’t claim that it is a dumb idea to minimize the squared distances of the data points from the curve. But why use squared and not, for example, the absolute differences?
Concerning classification, we considered in chapter 2, a classification problem where the task was to decide if a banknote was faked or not. In another example, you classified images of handwritten digits (0,1,...,9). In those cases we used a loss function called categorical cross entropy. What is this and how do we get to it in the first place?