chapter two

2 Generalization: A Classical View

This chapter covers

Step by step implementation of training a linear regression model
Key mathematical concepts in a typical model training workflow, highlighting data, model, cost, and optimization
The classical view on generalization in statistical modeling and machine learning
A closer look at crucial generalization concepts, including empirical risk minimization, bias-variance trade-off, and underfitting versus overfitting

2.1 The data

2.1.1 Sampling from the underlying data distribution

2.1.2 The train-test split

2.2 The model

2.2.1 The prediction function

2.2.2 The bias trick

2.2.3 Implementing the prediction function

2.3 The cost function

2.3.1 Expressing the cost function with linear algebra

2.4 The optimization algorithm

2.4.1 The multiple minima

2.4.2 The closed-form solution of linear regression

2.4.3 The gradient descent algorithm

2.4.4 Different types of gradient descent

2.4.5 The stochastic gradient descent algorithm

2.4.6 The impact of the learning rate

2.5 Improving the predictive performance

2.5.1 Augmented representation via feature engineering

2.5.2 Quadratic basis function

2.6 Empirical risk minimization

2.6.1 More on the model

2.6.2 Bias and variance decomposition

2.6.3 Understanding bias and variance using bootstrap

2.6.4 Reduced generalization with high model complexity

2.7 Summary

@font-face { font-family: 'livebook'; src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0'); src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0') format('embedded-opentype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.woff?1.9.0') format('woff'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.ttf?1.9.0') format('truetype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.svg?1.9.0') format('svg'); font-weight: normal; font-style: normal; }