3 The simple linear regression model

This chapter covers

Understanding the theory behind simple linear regression
Leveraging the theory to assess and interpret fitted regression models
Analyzing the residuals to check assumptions

Now that we’ve fitted regression lines to several different datasets, we have to deal with a few things. The first is glaringly obvious in most of the examples: few of the observed data points are on the regression lines, and some of them seem pretty far away! Of course, this makes sense— while we can make an informed guess about, say, the infant mortality rate corresponding to a given literacy rate, we know that our guess is probably going to be wrong, by an amount that we cannot predict precisely. The second concern is that the equation of the line would almost certainly change if we used different data. For the UNICEF data, for example, the infant mortality rates were collected in 2011 and the literacy rates were collected between 2006 and 2010; if we used older or newer data, the slope and the intercept of the regression line would probably differ from what we found before. Third, the quality of the data matters— since there are always some imperfections in recorded observations, the models built from them can’t be perfect, either.

3.1 Random variables and the theoretical model

3.2 The normal linear regression model and inferences

3.2.1 Sampling distributions of regression coefficients

3.2.2 Standard errors and parameter estimates

3.2.3 Significance tests for the coefficients

3.2.4 Predicting average and individual responses

3.3 Examples

3.3.1 Analyzing the compressive strength of concrete

3.4 Exercises

3.5 Summary