6 Sequential ensembles: Newton boosting

This chapter covers

Using Newton’s descent to optimize loss functions for training models
Implementing and understanding how Newton boosting works
Learning with regularized loss functions
Introducing XGBoost as a powerful framework for Newton boosting
Avoiding overfitting with XGBoost

In the previous two chapters, we saw two approaches to constructing sequential ensembles: In chapter 4, we introduced a new ensemble method called adaptive boosting (AdaBoost), which uses weights to identify the most misclassified examples. In chapter 5, we introduced another ensemble method called gradient boosting, which uses gradients (residuals) to identify the most misclassified examples. The fundamental intuition behind both of these boosting methods is to target the most misclassified (essentially, the worst behaving) examples at every iteration to improve classification.

6.1 Newton’s method for minimization

6.1.1 Newton’s method with an illustrative example

6.1.2 Newton’s descent over loss functions for training

6.2 Newton boosting: Newton’s method + boosting

6.2.1 Intuition: Learning with weighted residuals

6.2.2 Intuition: Learning with regularized loss functions

6.2.3 Implementing Newton boosting

6.3 XGBoost: A framework for Newton boosting

6.3.1 What makes XGBoost “extreme”?

6.3.2 Newton boosting with XGBoost

6.4 XGBoost in practice

6.4.1 Learning rate

6.4.2 Early stopping

6.5 Case study redux: Document retrieval

6.5.1 The LETOR data set

6.5.2 Document retrieval with XGBoost