5 Data Engineering and Data Shaping

 

This chapter will show you how to use R to organize or wrangle data into a shape useful for analysis. Data shaping is a set of steps you have to take if your data is not found all in one table or in an arrangement ready for analysis.

This chapter covers:

  • Becoming comfortable applying data transforms.
  • Starting with important data manipulation packages including data.table and dplyr.
  • Learning the concepts of "shape of data" or "data coordinates."

Figure 5.1  is the mental model for this chapter: working with data.

Figure 5.1. Chapter 5 Mental Model
Chapter 5 Mental Model

Previous chapters have been assuming the data is in a ready-to-go form, or we have pre-prepared the data to be in such a form for you. This chapter will prepare you to take these steps yourself. The basic concept of data wrangling is to visualize your data being structured to make your task easier, and then take steps to add this structure to your data. To teach this we are going to work a number of examples, each with a motivating task and then working a transform that solves the problem. We are going to concentrate on a set of transforms that are powerful, useful, and cover most common situations.

We will show data wrangling solutions using base R, data.table, and dplyr. Each of these has its advantages, which is why we are presenting more than one solution. Throughout this book we are deliberately using a polyglot approach to data wrangling: mixing base R, data.table, and dplyr, and as convenient.

5.1  Data Selection

5.1.1  Subsetting Rows and Columns

5.1.2  Removing records with incomplete data

5.1.3  Ordering rows

5.2   Basic Data Transforms

5.2.1  Add new columns

5.2.2  Other simple operations

5.2.3  Parametric programming

5.3  Aggregating Transforms

5.3.1 Scenario

5.3.2  Combining many rows into summary rows

5.4  Multi-Table Data Transforms

5.4.1  Combining two or more ordered data.frames quickly

5.4.2  Principled methods to combine data from multiple tables

5.5  Reshaping Transforms

5.5.1  Moving data from wide to tall form

5.5.2  Moving data from tall to wide form

sitemap