chapter five

5 Sequential Ensembles: Gradient Boosting

This chapter covers

Using gradient descent to optimize loss functions for training models
Implementing and understanding how gradient boosting works
Training fast gradient-boosting models with histogram-based splitting for tree learning
Introducing LightGBM: a powerful framework for gradient boosting
Avoiding overfitting with LightGBM in practice
Using custom loss function with LightGBM

The last chapter introduced boosting: where we train weak learners sequentially and “boost” them into a strong ensemble model. An important sequential ensemble method introduced in the last chapter is adaptive boosting, or AdaBoost.

AdaBoost is a foundational boosting model that trains a new weak learner to fix the misclassifications of the previous weak learner. It does this by maintaining and adaptively updating weights on training examples. These weights reflect the extent of misclassification and indicate priority training examples to the base learning algorithm.

In this chapter we look at an alternative to weights on training examples to convey misclassification information to a base learning algorithm for boosting: loss function gradients.

5.1 Gradient Descent for Minimization

5.1.1 Gradient Descent with an Illustrative Example

5.1.2 Gradient Descent over Loss Functions for Training

5.2 Gradient Boosting: Gradient Descent + Boosting

5.2.1 Intuition: Learning with Residuals

5.2.2 Implementing Gradient Boosting

5.2.3 Gradient Boosting with scikit-learn

5.2.4 Histogram-based Gradient Boosting

5.3 LightGBM: A Framework for Gradient Boosting

5.3.1 What Makes LightGBM “Light”?

5.3.2 Gradient Boosting with LightGBM

5.4 LightGBM in Practice

5.4.1 Learning Rate

5.4.2 Early Stopping

5.4.3 Custom Loss Functions

5.5 Case Study: Document Retrieval

5.5.1 The LETOR Data Set

5.5.2 Document Retrieval with LightGBM

5.6 Summary