Part 3 Ensembles in the wild: Adapting ensemble methods to your data

 

The world of data is a wild and dangerous place for a data scientist. We must contend with different types of data, such as counts, categories, and strings, strewn with missing values and noise. We are asked to build predictive models for different types of tasks: binary classification, multiclass classification, regression, and ranking.

We have to build our machine-learning pipelines and preprocess our data with care to avoid data leakage. They have to be accurate, fast, robust, and meme-worthy (ok, that last one is probably optional). After all this, we end up with models that may well do the job they were trained for but are ultimately black boxes that no one understands. 

In this final part of the book, you’ll learn how to tackle these challenges, armed with the arsenal of ensemble methods from the previous part of the book, as well as a few new ensemble methods. This is your last stop from ensembler-in-training to seasoned ensembler-explorer of the wild world of data. 

Chapter 7 covers ensemble learning for regression tasks, where you’ll learn how to adapt different ensemble methods to handle continuous and count-valued labels.