chapter five

5 Sequential Ensembles: Gradient Boosting

 

This chapter covers

  • Using gradient descent to optimize loss functions for training models
  • Implementing and understanding how gradient boosting works
  • Training fast gradient-boosting models with histogram-based splitting for tree learning
  • Introducing LightGBM: a powerful framework for gradient boosting
  • Avoiding overfitting with LightGBM in practice
  • Using custom loss function with LightGBM

The last chapter introduced boosting: where we train weak learners sequentially and “boost” them into a strong ensemble model. An important sequential ensemble method introduced in the last chapter is adaptive boosting, or AdaBoost.

AdaBoost is a foundational boosting model that trains a new weak learner to fix the misclassifications of the previous weak learner. It does this by maintaining and adaptively updating weights on training examples. These weights reflect the extent of misclassification and indicate priority training examples to the base learning algorithm.

In this chapter we look at an alternative to weights on training examples to convey misclassification information to a base learning algorithm for boosting: loss function gradients.

5.1 Gradient Descent for Minimization

5.1.1 Gradient Descent with an Illustrative Example

5.1.2 Gradient Descent over Loss Functions for Training

5.2 Gradient Boosting: Gradient Descent + Boosting

5.2.1 Intuition: Learning with Residuals

5.2.2 Implementing Gradient Boosting

5.2.3 Gradient Boosting with scikit-learn

5.2.4 Histogram-based Gradient Boosting

5.3 LightGBM: A Framework for Gradient Boosting

5.3.1 What Makes LightGBM “Light”?

5.3.2 Gradient Boosting with LightGBM

5.4 LightGBM in Practice

5.4.1 Learning Rate

5.4.2 Early Stopping

5.4.3 Custom Loss Functions

5.5 Case Study: Document Retrieval

5.5.1 The LETOR Data Set

5.5.2 Document Retrieval with LightGBM

5.6 Summary