Chapter 3. Splitting datasets one feature at a time: decision trees
This chapter covers
Have you ever played a game called Twenty Questions? If not, the game works like this: One person thinks of some object and players try to guess the object. Players are allowed to ask 20 questions and receive only yes or no answers. In this game, the people asking the questions are successively splitting the set of objects they can deduce. A decision tree works just like the game Twenty Questions; you give it a bunch of data and it generates answers to the game.
The decision tree is one of the most commonly used classification techniques; recent surveys claim that it’s the most commonly used technique.[1] You don’t have to know much about machine learning to understand how it works.
1 Giovanni Seni and John Elder, Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions, Synthesis Lectures on Data Mining and Knowledge Discovery (Morgan and Claypool, 2010), 28.