This mini-chapter covers
- What is underfitting and overfitting?
- Underfitting and overfitting in regression models.
- A solution for avoiding overfitting: Testing the model.
- Using a model complexity graph to take decisions on our model.
- Another solution to avoid overfitting: Regularization.
- Calculating the complexity of the model using the L1 and L2 norms.
- Picking the best model in terms of performance and complexity.
Imagine that you have learned some great machine learning algorithms, and you are ready to apply them. You go to work as a data scientist and your first task is to build a machine learning model for a dataset of customers. You build it and put it in production. Then everything goes wrong, the model doesn’t do a good job making predictions. What happened?
It turns out that this is a very common story. Many things can go wrong with our models, and for that problem, we can use a number of techniques to improve them. In this chapter, I show you two particular problems that happen very often when training models: Underfitting and overfitting. I will then show you some solutions to avoid underfitting and overfitting our models: Testing and validation, the model complexity graph, and regularization.