Chapter 3. Splitting datasets one feature at a time: decision trees

 

This chapter covers

  • Introducing decision trees
  • Measuring consistency in a dataset
  • Using recursion to construct a decision tree
  • Plotting trees in Matplotlib

Have you ever played a game called Twenty Questions? If not, the game works like this: One person thinks of some object and players try to guess the object. Players are allowed to ask 20 questions and receive only yes or no answers. In this game, the people asking the questions are successively splitting the set of objects they can deduce. A decision tree works just like the game Twenty Questions; you give it a bunch of data and it generates answers to the game.

The decision tree is one of the most commonly used classification techniques; recent surveys claim that it’s the most commonly used technique.[1] You don’t have to know much about machine learning to understand how it works.

1 Giovanni Seni and John Elder, Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions, Synthesis Lectures on Data Mining and Knowledge Discovery (Morgan and Claypool, 2010), 28.

3.1. Tree construction

3.2. Plotting trees in Python with Matplotlib annotations

3.3. Testing and storing the classifier

3.4. Example: using decision trees to predict contact lens type

3.5. Summary