This chapter covers
- Decision trees and the decision tree learning algorithm
- Random forest: putting multiple trees together into one model
- Gradient boosting as an alternative way of combining decision trees
In Chapter 3 we’ve described the binary classification problem and used the logistic regression model to predict if a customer is going to churn.
In this chapter, we’ll also solve a binary classification problem, but using a different family of machine learning models: tree-based models. Decision tree is the simplest tree-based model, which is nothing else, but a sequence of if-then-else rules put together. Multiple decision trees can be combined into an ensemble to achieve better performance. We’ll cover two tree-based ensemble models: random forest and gradient boosting.
The project we prepared for this chapter is default prediction: we’ll predict if a customer fails to pay back a loan or not. We’ll learn how to train decision trees and random forest models with Scikit-Learn and explore XGBoost — a library for implementing gradient boosting models.
Imagine that you work at a bank. When we receive a loan application, we need to make sure that if we give the money, the customer will be able to pay it back. With every application, there’s a risk of default — the failure to return the money.