This chapter covers:
- What does overfitting look like for regression problems?
- What is regularization?
- What are ridge regression, LASSO, and elastic net?
- What are the L1 and L2 norms and how are they used to shrink parameters?
Our societies are full of checks and balances. In our political systems, parties balance each other to (in theory) find solutions that are at neither extreme of each other’s views. Professional areas, such as financial services, have regulatory bodies that prevent them from doing wrong, and ensure the things they say and do are truthful and correct. When it comes to machine learning, it turns out we can apply our own form of regulation to the learning process to prevent the algorithms from overfitting the training set. We call this regulation in machine learning, regularization.
In this section I’ll explain what regularization is and why it’s useful. Regularization (also sometimes called shrinkage) is a technique that prevents the parameters of a model from becoming too large, and "shrinks" them towards zero. The impact of regularization is that it results in models that, when making predictions on new data, have less variance.
Note
Recall that when we say a model has "less variance" we mean they make less variable predictions on new data, as they are not as sensitive to the noise in the training set.