Chapter 11. Preventing overfitting with ridge regression, LASSO, and elastic net

 

This chapter covers

  • Managing overfitting in regression problems
  • Understanding regularization
  • Using the L1 and L2 norms to shrink parameters

Our societies are full of checks and balances. In our political systems, parties balance each other (in theory) to find solutions that are at neither extreme of each other’s views. Professional areas, such as financial services, have regulatory bodies to prevent them from doing wrong and ensure that the things they say and do are truthful and correct. When it comes to machine learning, it turns out we can apply our own form of regulation to the learning process to prevent the algorithms from overfitting the training set. We call this regulation in machine learning regularization.

11.1. What is regularization?

In this section, I’ll explain what regularization is and why it’s useful. Regularization (also sometimes called shrinkage) is a technique that prevents the parameters of a model from becoming too large and “shrinks” them toward 0. The result of regularization is models that, when making predictions on new data, have less variance.

Note

Recall that when we say a model has “less variance,” we mean it makes less-variable predictions on new data, because it is not as sensitive to the noise in the training set.

11.2. What is ridge regression?

11.3. What is the L2 norm, and how does ridge regression use it?

11.4. What is the L1 norm, and how does LASSO use it?

11.5. What is elastic net?

11.6. Building your first ridge, LASSO, and elastic net models

11.7. Benchmarking ridge, LASSO, elastic net, and OLS against each other

11.8. Strengths and weaknesses of ridge, LASSO, and elastic net

Summary

Solutions to exercises

sitemap