chapter twelve

12 Resampling statistics and bootstrapping

This chapter covers

Understanding the logic of permutation tests
Applying permutation tests to linear models
Using bootstrapping to obtain confidence intervals

In chapters 7, 8, and 9, we reviewed statistical methods that test hypotheses and estimate confidence intervals for population parameters by assuming that the observed data is sampled from a normal distribution or some other well-known theoretical distribution. But in many cases, this assumption is unwarranted. Statistical approaches based on randomization and resampling can be used in cases where the data is sampled from unknown or mixed distributions, where sample sizes are small, where outliers are a problem, or where devising an appropriate test based on a theoretical distribution is too complex and mathematically intractable.

In this chapter, we’ll explore two broad statistical approaches that use randomization: permutation tests and bootstrapping. Historically, these methods were only available to experienced programmers and expert statisticians. Contributed packages in R now make them readily available to a wider group of data analysts.

12.1 Permutation tests

12.2 Permutation tests with the coin package

12.2.1 Independent two-sample and k-sample tests

12.2.2 Independence in contingency tables

12.2.3 Independence between numeric variables

12.2.4 Dependent two-sample and k-sample tests

12.2.5 Going further

12.3 Permutation tests with the lmPerm package

12.3.1 Simple and polynomial regression

12.3.2 Multiple regression

12.3.3 One-way ANOVA and ANCOVA

12.3.4 Two-way ANOVA

12.4 Additional comments on permutation tests

12.5 Bootstrapping

12.6 Bootstrapping with the boot package

12.6.1 Bootstrapping a single statistic