This chapter covers
- Understanding the theory behind simple linear regression
- Leveraging the theory to assess and interpret fitted regression models
- Analyzing the residuals to check assumptions
Now that we’ve fitted regression lines to several different datasets, we have to deal with a few things. The first is glaringly obvious in most of the examples: few of the observed data points are on the regression lines, and some of them seem pretty far away! Of course, this makes sense— while we can make an informed guess about, say, the infant mortality rate corresponding to a given literacy rate, we know that our guess is probably going to be wrong, by an amount that we cannot predict precisely. The second concern is that the equation of the line would almost certainly change if we used different data. For the UNICEF data, for example, the infant mortality rates were collected in 2011 and the literacy rates were collected between 2006 and 2010; if we used older or newer data, the slope and the intercept of the regression line would probably differ from what we found before. Third, the quality of the data matters— since there are always some imperfections in recorded observations, the models built from them can’t be perfect, either.