In chapter 3, we described the binary classification problem and used the logistic regression model to predict if a customer is going to churn.
In this chapter, we also solve a binary classification problem, but we use a different family of machine learning models: tree-based models. Decision trees, the simplest tree-based model, are nothing but a sequence of if-then-else rules put together. We can combine multiple decision trees into an ensemble to achieve better performance. We cover two tree-based ensemble models: random forest and gradient boosting.
The project we prepared for this chapter is default prediction: we predict whether or not a customer will fail to pay back a loan. We learn how to train decision trees and random forest models with Scikit-learn and explore XGBoost—a library for implementing gradient boosting models.
Imagine that we work at a bank. When we receive a loan application, we need to make sure that if we give the money, the customer will be able to pay it back. Every application carries a risk of default—the failure to return the money.