4 Tidying data for analysis

 

This chapter covers

  • A review of the characteristics of tidy data
  • How to reshape a data table using the pivot_longer() function
  • Further tidying of data tables with the na_if() function

The last two chapters introduced us to data transformation (using dplyr) and data visualization (using ggplot). We really learned a lot, and we just dove right in with example data tables that were furnished by the edr package. The sw and dmd datasets from Chapters 2 and 3 were tidy datasets. We didn’t have to think much about the arrangement of the data within those tables, there were used as is. They just worked. Quite often, however, the datasets you’ll encounter and want to use will not be tidy. This can cause problems in your analyses, so we really need to make the effort to tidy them before performing any analysis. This chapter is all about recognizing the difference between tidy and untidy data, and then, using strategies to tidy that data before getting to the analysis stage.

4.1       What Is Tidy Data?

4.2       Using tidyr to Tidy Our Tables

4.2.1   Identifying Untidiness and Proposing Some Solutions to Tidy Up

4.2.2   Addressing Untidyness by Using the pivot_longer() Function

4.2.3   Using the separate() Function to Split a Column into Several

4.2.4   Inspecting Our Tidied Data by Plotting with ggplot

4.2.5   Replacing Missing Values with Actual NAs

4.3       Exercises

4.4       Answers to Exercises

4.5       Summary