4 Building loss functions with the likelihood approach
This chapter covers
- Using the maximum likelihood approach for estimating model parameters
- Determining a loss function for classification problems
- Determining a loss function for regression problems
In the last chapter you saw how you can determine parameter values through optimizing a loss function using stochastic gradient descent (SGD). This approach also works for deep learning models that have millions of parameters. But how did we arrive at the loss function? In the linear regression problem (see section 1.4 and 3.1), we used the mean squared error (MSE) as a loss function. We don’t claim that it is a bad idea to minimize the squared distances of the data points from the curve. But why use squared and not, for example, the absolute differences?
It turns out that there is a generally valid approach for deriving the loss function when working with probabilistic models. This approach is called the maximum likelihood approach. You will see in this chapter that the maximum likelihood approach yields for the linear regression the MSE as loss function for some assumptions, which we will discuss in detail in this chapter.