Part 1 The basics of ensembles

 

You’ve probably heard a lot about “random forests,” “XGBoost,” or “gradient boosting.” Someone always seems to be using one or another of these to build cool applications or win Kaggle competitions. Have you ever wondered what this fuss is all about?

The fuss, it turns out, is all about ensemble methods, a powerful machine-learning paradigm that has found its way into all kinds of applications in health care, finance, insurance, recommendation systems, search, and a lot of other areas.

This book will introduce you to the wide world of ensemble methods, and this part will get you going. To paraphrase the incomparable Julie Andrews from The Sound of Music,

Let’s start at the very beginning,

A very good place to start.

When you read, you begin with A-B-C.

When you ensemble, you begin with fit-versus-complexity.

The first part of this book will gently introduce ensemble methods with a bit of intuition and a bit of theory on fit versus complexity (or the bias-variance tradeoff, as it’s more formally called). You’ll then build your very first ensemble from scratch.

When you’re finished with this part of the book, you’ll understand why ensemble models are often better than individual models and why you should care about them.