10 Exploring advanced methods

 

This chapter covers

  • Decision tree–based models
  • Generalized additive models
  • Support vector machines

In chapter 7, you learned about linear methods for fitting predictive models. These models are the bread-and-butter methods of machine learning; they are easy to fit; they are small, portable, and efficient; they sometimes provide useful advice; and they can work well in a wide variety of situations. However, they also make strong assumptions about the world: namely, that the outcome is linearly related to all the inputs, and all the inputs contribute additively to the outcome. In this chapter, you will learn about methods that relax these assumptions.

Figure 10.1 represents our mental model for what we'll do in this chapter: use R to master the science of building supervised machine learning models.

Figure 10.1. Mental model
Example

Suppose you want to study the relationship between mortality rates and measures of a person’s health or fitness, including BMI (body mass index).

Figure 10.2 shows the relationship between BMI and mortality hazard ratio for a population of older Thais over a four-year period.[1] It shows that both high and low BMI are associated with higher mortality rates: the relationship between BMI and mortality is not linear. So a straightforward linear model to predict mortality rates based (partly) on BMI may not perform well.

Figure 10.2. Mortality rates of men and women as a function of body mass index

10.1. Tree-based methods

10.1.1. A basic decision tree

10.1.2. Using bagging to improve prediction

10.1.3. Using random forests to further improve prediction

10.1.4. Gradient-boosted trees

10.1.5. Tree-based model takeaways

10.2. Using generalized additive models (GAMs) to learn non-monotone relationships

10.2.1. Understanding GAMs

10.2.2. A one-dimensional regression example