5 Decision trees and gradient boosting

This chapter covers

Decision trees and their ensembles
Gradient boosting decision trees
Scikit-learn’s gradient boosting decision trees options
XGBoost algorithm and its innovations
How LightGBM algorithm works

So far, we have explored machine learning algorithms based on linear models because they can handle tabular problems from datasets consisting of a few rows and columns and find a way to scale to problems of millions of rows and many columns. In addition, linear models are fast to train and get predictions from. Moreover, they are relatively easy to understand, explain, and tweak. Linear models are also helpful because they present many concepts we will keep building on in the book, such as L1 and L2 regularization and gradient descent.

This chapter will discuss a different classical machine learning algorithm: decision trees. Decision trees are the foundations of ensemble models such as random forests and boosting. We will especially focus on a machine learning ensemble algorithm, gradient boosting, and its implementations eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosted Machines (LightGBM), which are considered state-of-the-art solutions for tabular data.

5.1 Introduction to tree-based methods

5.1.1 Bagging and sampling

5.1.2 Predicting with random forests

5.1.3 Resorting to extremely randomized trees

5.2 Gradient boosting

5.2.1 How gradient boosting works

5.2.2 Extrapolating with gradient boosting

5.2.3 Explaining gradient boosting effectiveness

5.3 Boosting in Scikit-learn

5.3.1 Applying early stopping to avoid overfitting

5.4 Using XGBoost

5.4.1 XGBoost’s key parameters

5.4.2 How XGBoost works

5.4.3 Accelerating with histogram splitting

5.4.4 Applying early stopping to XGBoost

5.5 Introduction to LightGBM

5.5.1 How LightGBM grows trees

5.5.2 Gaining speed with exclusive feature bundling and gradient-based one-side sampling

5.5.3 Applying early stopping to LightGBM