4 Tidying data for analysis
This chapter covers
- A review of the characteristics of tidy data
- How to reshape a data table using the pivot_longer() function
- Further tidying of data tables with the na_if() function
The last two chapters introduced us to data transformation (using dplyr) and data visualization (using ggplot). We really learned a lot, and we just dove right in with example data tables that were furnished by the edr package. The sw and dmd datasets from Chapters 2 and 3 were tidy datasets. We didn’t have to think much about the arrangement of the data within those tables, there were used as is. They just worked. Quite often, however, the datasets you’ll encounter and want to use will not be tidy. This can cause problems in your analyses, so we really need to make the effort to tidy them before performing any analysis. This chapter is all about recognizing the difference between tidy and untidy data, and then, using strategies to tidy that data before getting to the analysis stage.