4 Building loss functions with the likelihood approach

 

This chapter covers

  • Using the maximum likelihood approach for estimating model parameters
  • Determining a loss function for classification problems
  • Determining a loss function for regression problems

In the last chapter, you saw how you can determine parameter values through optimizing a loss function using stochastic gradient descent (SGD). This approach also works for DL models that have millions of parameters. But how did we arrive at the loss function? In the linear regression problem (see sections 1.4 and 3.1), we used the mean squared error (MSE) as a loss function. We don’t claim that it is a bad idea to minimize the squared distances of the data points from the curve. But why use squared and not, for example, the absolute differences?

It turns out that there is a generally valid approach for deriving the loss function when working with probabilistic models. This approach is called the maximum likelihood approach (MaxLike). You’ll see that the MaxLike approach yields for the linear regression the MSE as loss function for some assumptions, which we discuss in detail in this chapter.

4.1 Introduction to the MaxLike principle: The mother of all loss functions

 

4.2 Deriving a loss function for a classification problem

 
 

4.2.1 Binary classification problem

 
 

4.2.2 Classification problems with more than two classes

 
 
 

4.2.3 Relationship between NLL, cross entropy, and Kullback-Leibler divergence

 
 
 

4.3 Deriving a loss function for regression problems

 
 

4.3.1 Using a NN without hidden layers and one output neuron for modeling a linear relationship between input and output

 
 

4.3.2 Using a NN with hidden layers to model non-linear relationships between input and output

 
 
 
 

4.3.3 Using an NN with additional output for regression tasks with nonconstant variance

 
 

Summary

 
 
 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage