chapter seven

7 Basic statistics

This chapter covers

Descriptive statistics
Frequency and contingency tables
Correlations and covariances
T-tests
Nonparametric statistics

In previous chapters, you learned how to import data into R and use a variety of functions to organize and transform the data into a useful format. We then reviewed basic methods for visualizing data.

Once your data is properly organized and you’ve begun to explore it visually, the next step is typically to describe the distribution of each variable numerically, followed by an exploration of the relationships among selected variables, two at a time. The goal is to answer questions like these:

What kind of mileage are cars getting these days? Specifically, what’s the distribution of miles per gallon (mean, standard deviation, median, range, and so on) in a survey of automobile makes and models?
After a new drug trial, what’s the outcome (no improvement, some improvement, marked improvement) for drug versus placebo groups? Does the gender of the participants have an impact on the outcome?
What’s the correlation between income and life expectancy? Is it significantly different from zero?
Are you more likely to receive imprisonment for a crime in different regions of the United States? Are the differences between regions statistically significant?

7.1 Descriptive statistics

7.1.1 A menagerie of methods

7 Basic statistics

This chapter covers

7.1 Descriptive statistics

7.1.1 A menagerie of methods

7.1.2 Even more methods

7.1.3 Descriptive statistics by group

7.1.4 Summarizing data interactively with dplyr

7.1.5 Visualizing results

7.2 Frequency and contingency tables

7.2.1 Generating frequency tables

7.2.2 Tests of independence

7.2.3 Measures of association

7.2.4 Visualizing results