5 Sequential Ensembles: Gradient Boosting
This chapter covers
- Using gradient descent to optimize loss functions for training models
- Implementing and understanding how gradient boosting works
- Training fast gradient-boosting models with histogram-based splitting for tree learning
- Introducing LightGBM: a powerful framework for gradient boosting
- Avoiding overfitting with LightGBM in practice
- Using custom loss function with LightGBM
The last chapter introduced boosting: where we train weak learners sequentially and “boost” them into a strong ensemble model. An important sequential ensemble method introduced in the last chapter is adaptive boosting, or AdaBoost.
AdaBoost is a foundational boosting model that trains a new weak learner to fix the misclassifications of the previous weak learner. It does this by maintaining and adaptively updating weights on training examples. These weights reflect the extent of misclassification and indicate priority training examples to the base learning algorithm.
In this chapter we look at an alternative to weights on training examples to convey misclassification information to a base learning algorithm for boosting: loss function gradients.