This chapter covers
- Working with linear regression
- Performance metrics for regression tasks
- Using machine learning algorithms to impute missing values
- Performing feature selection algorithmically
- Combining preprocessing wrappers in mlr
Our first stop in part 3, “Regression,” brings us to linear regression. A classical and commonly used statistical method, linear regression builds predictive models by estimating the strength of the relationship between our predictor variables and our outcome variable. Linear regression is so named because it assumes the relationships between the predictor variables with the outcome variable are linear. Linear regression can handle both continuous and categorical predictor variables, and I’ll show you how in this chapter.
By the end of this chapter, I hope you’ll understand a general approach to regression problems with mlr, and how this differs from classification. In particular, you’ll understand the different performance metrics we use for regression tasks, because mean misclassification error (MMCE) is no longer meaningful. I’ll also show you, as I promised in chapter 4, more sophisticated approaches to missing value imputation and feature selection. Finally, I’ll cover how to combine as many preprocessing steps as we like using sequential wrappers, so we can include them in our cross-validation.