13 Generalized linear models

 

This chapter covers

  • Formulating a generalized linear model
  • Predicting categorical outcomes
  • Modeling count data

In chapters 8 (regression) and 9 (ANOVA), we explored linear models that can be used to predict a normally distributed response variable from a set of continuous and/or categorical predictor variables. But in many situations, it’s unreasonable to assume that the dependent variable is normally distributed (or even continuous). For example:

  • The outcome variable may be categorical. Binary variables (for example, yes/no, passed/failed, lived/died) and polytomous variables (for example, poor/good/excellent, Republican/Democrat/independent) clearly aren’t normally distributed.
  • The outcome variable may be a count (for example, the number of traffic accidents in a week, the number of drinks per day). Such variables take on a limited number of values and are never negative. Additionally, their mean and variance are often related (which isn’t true for normally distributed variables).

Generalized linear models extend the linear-model framework to include dependent variables that are decidedly non-normal.

In this chapter, we’ll start with a brief overview of generalized linear models and the glm() function used to estimate them. Then we’ll focus on two popular models in this framework: logistic regression (where the dependent variable is categorical) and Poisson regression (where the dependent variable is a count variable).

13.1 Generalized linear models and the glm() function

13.1.1 The glm() function

13.1.2 Supporting functions

13.1.3 Model fit and regression diagnostics

13.2 Logistic regression

13.2.1 Interpreting the model parameters

13.2.2 Assessing the impact of predictors on the probability of an outcome

13.2.3 Overdispersion

13.2.4 Extensions

13.3 Poisson regression

13.3.1 Interpreting the model parameters

13.3.2 Overdispersion