3 Basic data management

This chapter covers

Manipulating dates and missing values
Understanding data type conversions
Creating and recoding variables
Sorting, merging, and subsetting datasets
Selecting and dropping variables

In chapter 2, we covered various methods for importing data into R. Unfortunately, getting your data in the rectangular arrangement of a matrix or data frame is only the first step in preparing it for analysis. To paraphrase Captain Kirk in the Star Trek episode “A Taste of Armageddon” (and proving my geekiness once and for all), “Data is a messy business—a very, very messy business.” In my own work, as much as 60% of any data analysis project is spent cleaning and organizing the data. I’ll go out on a limb and say that the same is probably true for most real-world data analysts. Let’s take a look at an example.

3.1 A working example

One of the topics that I study in my current job is how men and women differ in the ways they lead their organizations. Typical questions might be

Do men and women in management positions differ in the degree to which they defer to superiors?
Does this vary from country to country, or are these gender differences universal?

One way to address these questions is to have bosses in multiple countries rate their managers on deferential behavior, using questions like the following.

This manager asks my opinion before making personnel decisions.
1	2	3	4	5
strongly disagree	disagree	neither agree nor disagree	agree	strongly agree

3 Basic data management

This chapter covers

3.1 A working example

3.2 Creating new variables

3.3 Recoding variables

3.4 Renaming variables

3.5 Missing values