chapter four

4 Optimizing the training process - Underfitting, overfitting, testing, and regularization

 

This mini-chapter covers

  • What is underfitting and overfitting?
  • Underfitting and overfitting in regression models.
  • A solution for avoiding overfitting: Testing the model.
  • Using a model complexity graph to take decisions on our model.
  • Another solution to avoid overfitting: Regularization.
  • Calculating the complexity of the model using the L1 and L2 norms.
  • Picking the best model in terms of performance and complexity.

Imagine that you have learned some great machine learning algorithms, and you are ready to apply them. You go to work as a data scientist and your first task is to build a machine learning model for a dataset of customers. You build it and put it in production. Then everything goes wrong, the model doesn’t do a good job making predictions. What happened?

It turns out that this is a very common story. Many things can go wrong with our models, and for that problem, we can use a number of techniques to improve them. In this chapter, I show you two particular problems that happen very often when training models: Underfitting and overfitting. I will then show you some solutions to avoid underfitting and overfitting our models: Testing and validation, the model complexity graph, and regularization.

4.1   An example of underfitting and overfitting using polynomial regression

4.2   How do we get the computer to pick the right model? By testing

4.2.1   How do we pick the testing set and how big should it be?

4.2.2   Can we use our testing data for training the model? No.

4.3   Where did we break the golden rule, and how do we fix it? The validation set

4.4   A numerical way to decide how complex our model should be - The model evaluation graph

4.5   Another alternative to avoiding overfitting - Regularization

4.5.1   Another example of overfitting - Movie recommendations

4.5.2   Measuring how complex a model is - L1 and L2 norm

4.5.3   Modifying the cost function to solve our problem - Lasso regression and ridge regression

4.5.4   Regulating the amount of performance and complexity in our model - The regularization parameter

4.5.5   Effects of L1 and L2 regularization in the coefficients of the model

4.5.6   An intuitive way to see regularization

4.6   Summary