In chapters 8 (regression) and 9 (ANOVA), we explored linear models that can be used to predict a normally distributed response variable from a set of continuous and/or categorical predictor variables. But in many situations, it’s unreasonable to assume that the dependent variable is normally distributed (or even continuous). For example:
- The outcome variable may be categorical. Binary variables (for example, yes/no, passed/failed, lived/died) and polytomous variables (for example, poor/good/excellent, Republican/Democrat/independent) clearly aren’t normally distributed.
- The outcome variable may be a count (for example, the number of traffic accidents in a week, the number of drinks per day). Such variables take on a limited number of values and are never negative. Additionally, their mean and variance are often related (which isn’t true for normally distributed variables).
Generalized linear models extend the linear-model framework to include dependent variables that are decidedly non-normal.
In this chapter, we’ll start with a brief overview of generalized linear models and the glm() function used to estimate them. Then we’ll focus on two popular models in this framework: logistic regression (where the dependent variable is categorical) and Poisson regression (where the dependent variable is a count variable).