This chapter covers
- Using the k-nearest neighbors algorithm for regression
- Using tree-based algorithms for regression
- Comparing k-nearest neighbors, random forest, and XGBoost models
You’re going to find this chapter a breeze. This is because you’ve done everything in it before (sort of). In chapter 3, I introduced you to the k-nearest neighbors (kNN) algorithm as a tool for classification. In chapter 7, I introduced you to decision trees and then expanded on this in chapter 8 to cover random forest and XGBoost for classification. Well, conveniently, these algorithms can also be used to predict continuous variables. So in this chapter, I’ll help you extend these skills to solve regression problems.
By the end of this chapter, I hope you’ll understand how kNN and tree-based algorithms can be extended to predict continuous variables. As you learned in chapter 7, decision trees suffer from a tendency to overfit their training data and so are often vastly improved by using ensemble techniques. Therefore, in this chapter, you’ll train a random forest model and an XGBoost model, and benchmark their performance against the kNN algorithm.