18 Advanced methods for missing data

 

This chapter covers

  • Identifying missing data
  • Visualizing missing-data patterns
  • Deleting missing values
  • Imputing missing values

In previous chapters, we focused on analyzing complete datasets (that is, datasets without missing values). Although doing so helps simplify the presentation of statistical and graphical methods, in the real world, missing data are ubiquitous.

In some ways, the impact of missing data is a subject most of us want to avoid. Statistics books may not mention it or may limit discussion to a few paragraphs. Statistical packages offer automatic handling of missing data using methods that may not be optimal. Even though most data analyses (at least in the social sciences) involve missing data, this topic is rarely mentioned in the methods and results sections of journal articles. Given how often missing values occur and the degree to which their presence can invalidate study results, it’s fair to say that the subject has received insufficient attention outside of specialized books and courses.

18.1 Steps in dealing with missing data

18.2 Identifying missing values

18.3 Exploring missing-values patterns

18.3.1 Visualizing missing values

18.3.2 Using correlations to explore missing values

18.4 Understanding the sources and impact of missing data

18.5 Rational approaches for dealing with incomplete data

18.6 Deleting missing data

18.6.1 Complete-case analysis (listwise deletion)

18.6.2 Available case analysis (pairwise deletion)

18.7 Single imputation

18.7.1 Simple imputation

18.7.2 K-nearest neighbor imputation

18.7.3 missForest

18.8 Multiple imputation

Summary