chapter seven

7 Classifying with trees: Decision trees

 

This chapter covers:

  • What are decision trees?
  • Using the recursive partitioning algorithm to predict animal classes
  • An important weakness of decision trees

There’s nothing like the great outdoors. I live in the countryside, and when I walk my dog in the woods I’m reminded just how much we rely on trees. Trees produce the atmosphere we breath, create habitats for wildlife, provide us with food, and are surprisingly good at making predictions. Yes, you read that right: trees are good at making predictions. But before you go asking the birch in your back garden for next week’s lottery numbers, I should clarify that I’m referring to several supervised learning algorithms that use a branching tree structure. This family of algorithms can be used to solve both classification and regression tasks, can handle continuous and categorical predictors, and are naturally suited to solving multi-class classification problems.

Note

Remember that a predictor variable is a variable we believe may contain information about the value of our outcome variable. Continuous predictors can have any numeric value on their measurement scale, while categorical variables can have only finite, discrete values/categories.

7.1  What is the recursive partitioning algorithm?

7.1.1  Using Gini gain to split the tree

7.1.2  What about continuous, and multi-level categorical predictors?

7.1.3  Hyperparameters of the rpart algorithm

7.2  Building our first decision tree model

7.3  Loading and exploring the zoo dataset

7.4  Training the decision tree model

7.4.1  Training the model with the tuned hyperparameters

7.5  Cross-validating our decision tree model

7.6  Strengths and weaknesses of tree-based algorithms

7.7  Summary