chapter three

Chapter 3. Splitting datasets one feature at a time: decision trees

This chapter covers

Introducing decision trees
Measuring consistency in a dataset
Using recursion to construct a decision tree
Plotting trees in Matplotlib

Have you ever played a game called Twenty Questions? If not, the game works like this: One person thinks of some object and players try to guess the object. Players are allowed to ask 20 questions and receive only yes or no answers. In this game, the people asking the questions are successively splitting the set of objects they can deduce. A decision tree works just like the game Twenty Questions; you give it a bunch of data and it generates answers to the game.

The decision tree is one of the most commonly used classification techniques; recent surveys claim that it’s the most commonly used technique.^[1] You don’t have to know much about machine learning to understand how it works.

¹ Giovanni Seni and John Elder, Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions, Synthesis Lectures on Data Mining and Knowledge Discovery (Morgan and Claypool, 2010), 28.

Chapter 3. Splitting datasets one feature at a time: decision trees

This chapter covers

3.1. Tree construction

3.2. Plotting trees in Python with Matplotlib annotations

3.3. Testing and storing the classifier

3.4. Example: using decision trees to predict contact lens type

3.5. Summary