2 Homogeneous parallel ensembles: Bagging and random forests

 

This chapter covers

  • Training homogeneous parallel ensembles
  • Implementing and understanding bagging
  • Implementing and understanding how random forests work
  • Training variants with pasting, random subspaces, random patches, and Extra Trees
  • Using bagging and random forests in practice

In chapter 1, we introduced ensemble learning and created our first rudimentary ensemble. To recap, an ensemble method relies on the notion of “wisdom of the crowd”: the combined answer of many models is often better than any one individual answer. We begin our journey into ensemble learning methods in earnest with parallel ensemble methods. We begin with this type of ensemble method because, conceptually, parallel ensemble methods are easy to understand and implement.

Parallel ensemble methods, as the name suggests, train each component base estimator independently of the others, which means that they can be trained in parallel. As we’ll see, parallel ensemble methods can be further distinguished as homogeneous and heterogeneous parallel ensembles depending on the kind of learning algorithms they use.

2.1 Parallel ensembles

 
 
 
 

2.2 Bagging: Bootstrap aggregating

 

2.2.1 Intuition: Resampling and model aggregation

 
 
 
 

2.2.2 Implementing bagging

 
 
 
 

2.2.3 Bagging with scikit-learn

 
 
 

2.2.4 Faster training with parallelization

 
 
 

2.3 Random forests

 
 
 

2.3.1 Randomized decision trees

 

2.3.2 Random forests with scikit-learn

 

2.3.3 Feature importances

 
 
 

2.4 More homogeneous parallel ensembles

 
 
 
 

2.4.1 Pasting

 

2.4.2 Random subspaces and random patches

 
 

2.4.3 Extra Trees

 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage