chapter nine

9 Regression with lines: linear regression

This chapter covers:

What is linear regression?
What performance metrics we use for regression tasks
How to use machine learning algorithms to impute missing values
How to perform feature selection algorithmically
How to combine preprocessing wrappers in mlr

Our first stop in regression brings us to linear regression. A classical and commonly-used statistical method, linear regression builds predictive models by estimating the strength of the relationship between our predictor variables and our outcome variable. Linear regression is so-named, because it assumes the relationships between the predictor variables with the outcome variable, are linear. Linear regression can handle both continuous and categorical predictor variables, and I’ll show you how in this chapter.

By the end of this chapter, I hope you’ll understand a general approach to regression problems with mlr, and how this differs from classification. In particular, you’ll understand the different performance metrics we use for regression tasks, as mean misclassification error (mmce) is no longer meaningful. I’ll also show you, as I promised in chapter 4, more sophisticated approaches to missing value imputation and feature selection. Finally, I’ll cover how to combine as many preprocessing steps as we like using sequential wrappers, so we can include them in our cross-validation.

9.1 What is linear regression?

9.1.1 What if we have multiple predictors?

9.1.2 What if my predictors are categorical?

9 Regression with lines: linear regression

This chapter covers:

9.1 What is linear regression?

9.1.1 What if we have multiple predictors?

9.1.2 What if my predictors are categorical?

9.2 Building our first linear regression model

9.2.1 Loading and exploring the Ozone dataset

9.2.2 Imputing missing values

9.2.3 Automating feature selection

9.2.4 Including imputation and feature selection in our cross-validation

9.2.5 Interpreting the model

9.3 Strengths and weaknesses of linear regression

9.4 Summary

9.5 Solutions to exercises