chapter nine
9 From single trees to forests: Leo Breiman and the logic of ensemble learning
This chapter covers
- Leo Breiman’s Random Forests (2001) and the emergence of ensemble learning from unstable single trees
- Decision trees as expressive but high-variance learners and the roots of their generalization failure
- How bootstrap aggregation and the use of voting or averaging stabilizes noisy predictors
- The strength-correlation framework as a theory of ensemble generalization
- Random forests as a trade-off between local interpretability and global predictive reliability
The previous chapter examined how Vladimir Vapnik confronted one of the earliest and most persistent failures of machine learning: models that achieved impressive accuracy on training data yet performed unreliably on new examples. Vapnik’s response was to redefine learning itself. Rather than chasing accuracy alone, support vector machines—particularly soft-margin SVMs—sought generalization by explicitly controlling model capacity, balancing margin width against classification error, and grounding learning in statistical theory. Geometry became a disciplined safeguard against overfitting rather than a mere visualization of decision boundaries.