2 Homogeneous Parallel Ensembles: Bagging and Random Forests
This chapter covers
- Training homogeneous parallel ensembles
- Implementing and understanding how Bagging works
- Implementing and understanding how Random Forest works
- Training variants with pasting, random subspaces, random patches and ExtraTrees
- Using bagging and random forests in practice
In Chapter 1, we introduced ensemble learning and created our first rudimentary ensemble. To recap, an ensemble method relies on the notion of “wisdom of the crowd”: the combined answer of many diverse models is often better than any one individual answer.
We begin our journey into ensemble learning methods in earnest with parallel ensemble methods. We begin with this type of ensemble methods because, conceptually, parallel ensemble methods are easy to understand and implement.
Parallel ensemble methods, as the name suggests, train each component base estimator independently of the others, which means that they can be trained in parallel. As we will see, parallel ensemble methods can be further distinguished as homogeneous and heterogeneous parallel ensembles depending on the kind of learning algorithms they use.
In this chapter, we will learn about homogeneous parallel ensembles, whose component models are all trained using the same machine-learning algorithm. This is in contrast to heterogeneous parallel ensembles (covered in the next chapter), whose component models are trained using different machine-learning algorithms.