chapter six

6 Decision trees and ensemble learning

This chapter covers

Decision trees and the decision tree learning algorithm
Random forests: putting multiple trees together into one model
Gradient boosting as an alternative way of combining decision trees

In chapter 3, we described the binary classification problem and used the logistic regression model to predict if a customer is going to churn.

In this chapter, we also solve a binary classification problem, but we use a different family of machine learning models: tree-based models. Decision trees, the simplest tree-based model, are nothing but a sequence of if-then-else rules put together. We can combine multiple decision trees into an ensemble to achieve better performance. We cover two tree-based ensemble models: random forest and gradient boosting .

The project we prepared for this chapter is default prediction: we predict whether or not a customer will fail to pay back a loan. We learn how to train decision trees and random forest models with Scikit-learn and explore XGBoost—a library for implementing gradient boosting models.

6.1 Credit risk scoring project

Imagine that we work at a bank. When we receive a loan application, we need to make sure that if we give the money, the customer will be able to pay it back. Every application carries a risk of default—the failure to return the money.

6.1.1 Credit scoring dataset

6.1.2 Data cleaning

6.1.3 Dataset preparation

6.2 Decision trees

6 Decision trees and ensemble learning

This chapter covers

6.1 Credit risk scoring project

6.1.1 Credit scoring dataset

6.1.2 Data cleaning

6.1.3 Dataset preparation

6.2 Decision trees

6.2.1 Decision tree classifier

6.2.2 Decision tree learning algorithm

6.2.3 Parameter tuning for decision tree

6.3 Random forest

6.3.1 Training a random forest

6.3.2 Parameter tuning for random forest

6.4 Gradient boosting

6.4.1 XGBoost: Extreme gradient boosting

6.4.2 Model performance monitoring

6.4.3 Parameter tuning for XGBoost

6.4.4 Testing the final model

6.5 Next steps