chapter eight

8 Regression with lines: linear regression and generalized additive models

This chapter covers:

What is linear regression?
What performance metrics we use for regression tasks
How to use machine learning algorithms to impute missing values
How to perform feature selection algorithmically
How to combine preprocessing wrappers in mlr
What are generalized additive models (GAMs)?

Our first stop in regression brings us to linear regression and generalized additive models. Both of these techniques rely on the equation of a straight line to build models that predict a continuous variable. Each approach allows us to combine categorical and continuous predictor variables, but they differ in how complex they allow the relationship between the predictor and outcome variables to be. While linear regression models the relationship between each predictor variable and the outcome as a straight line, generalized additive models are more flexible (literally) as they allow for complex, non-linear relationships between predictors and outcome.

8.1 What is linear regression?

8.1.1 What if we have multiple predictors?

8.1.2 What if my predictors are categorical?

8.2 When the relationship isn’t linear: polynomial terms

8.3 When we need even more flexibility: splines and generalized additive models

8.4 Building our first linear regression model

8.4.1 Loading and exploring the Ozone dataset

8.4.2 Imputing missing values

8.4.3 Automating feature selection

8.4.3 Including imputation and feature selection in our cross-validation

8.4.4 Interpreting the model

8.5 Building our first GAM

8.6 Strengths and weaknesses of linear regression and GAMs

8.7 Summary

8.8 Solutions to exercises