This chapter covers:
- What is linear regression?
- What performance metrics we use for regression tasks
- How to use machine learning algorithms to impute missing values
- How to perform feature selection algorithmically
- How to combine preprocessing wrappers in mlr
Our first stop in regression brings us to linear regression. A classical and commonly-used statistical method, linear regression builds predictive models by estimating the strength of the relationship between our predictor variables and our outcome variable. Linear regression is so-named, because it assumes the relationships between the predictor variables with the outcome variable, are linear. Linear regression can handle both continuous and categorical predictor variables, and I’ll show you how in this chapter.
By the end of this chapter, I hope you’ll understand a general approach to regression problems with mlr, and how this differs from classification. In particular, you’ll understand the different performance metrics we use for regression tasks, as mean misclassification error (mmce) is no longer meaningful. I’ll also show you, as I promised in chapter 4, more sophisticated approaches to missing value imputation and feature selection. Finally, I’ll cover how to combine as many preprocessing steps as we like using sequential wrappers, so we can include them in our cross-validation.