4 Building loss functions with the likelihood approach

chapter four

This chapter covers

Using the maximum likelihood approach for estimating model parameters
Determining a loss function for classification problems
Determining a loss function for regression problems

In the last chapter you saw how you can determine parameter values through optimizing a loss function using stochastic gradient descent (SGD). This approach also works for deep learning models that have millions of parameters. But how did we arrive at the loss function? In the linear regression problem (see section 1.4 and 3.1), we used the mean squared error (MSE) as a loss function. We don’t claim that it is a bad idea to minimize the squared distances of the data points from the curve. But why use squared and not, for example, the absolute differences?

It turns out that there is a generally valid approach for deriving the loss function when working with probabilistic models. This approach is called the maximum likelihood approach. You will see in this chapter that the maximum likelihood approach yields for the linear regression the MSE as loss function for some assumptions, which we will discuss in detail in this chapter.

4 Building loss functions with the likelihood approach

This chapter covers

4.1 Introduction to the maximum likelihood principle, the mother of all loss functions

4.2 Deriving a loss function for a classification problem

4.3 Deriving a loss function for regression problems

4.4 Summary