chapter three

3 Generalization: A Modern View

This chapter covers

The modern view on generalization in machine learning and deep learning
The “double descent” phenomenon of over-parameterized models and its connection with the classical bias-variance tradeoff
The implementation of a smoothing spline model as an extension of the polynomial regression model with smoothness penalty
The extension of the “double descent” phenomenon to both model iterations and sample size

One of the biggest takeaways in chapter 2 is the bias-variance tradeoff. According to the classical view on generalization, a model with excessive complexity will incur high variance and overfit the data, thus not generalizing to the future test data. A properly trained model should balance underfitting and overfitting: It should be complex enough to learn the underlying relationship and simple enough to not fall into a potential spurious pattern due to random noise in the data. This best model, based on the classical statistics, is identified at the “sweet spot” in the inverted U-shaped risk curve for the test set, as shown in figure 1.7 in chapter 1.

3.1 A modern view on generalization

3.1.1 Beyond perfect interpolation

3.1.2 Behind the “double descent” phenomenon

3.1.3 Extending the “double descent” phenomenon

3.2 Double Descent in Polynomial Regression

3.2.1 Smoothing spline

3.2.2 Rewriting the smoothing spline cost function

3.2.3 Deriving the closed-form solution

3.2.4 Implementing the smoothing spline model

3.2.5 Sample non-monotonicity

3.3 Summary