6 Leveraging the best practices for machine learning tabular data
This chapter covers
- Processing features with more advanced methods
- Selecting useful features for lighter, more understandable models
- Optimizing hyperparameters to make your models shine in performance
- Mastering the specific characteristics and options from GBDTs
In the previous chapter, we discussed decision trees, their characteristics, their limitations, and all their ensemble models, both those based on random resamplings, such as Random Forests, and those based on boosting, such as Gradient Boosting. Since boosting solutions are the actual state of the art in tabular data modeling, we have explained how it works and optimized its predictions at length. In particular, we have presented a couple of solid gradient boosting implementations, XGBoost and LightGBM, that are proving the best solutions available to resort to for a data scientist dealing with tabular data.