3 Basic data management

 

This chapter covers

  • Manipulating dates and missing values
  • Understanding data type conversions
  • Creating and recoding variables
  • Sorting, merging, and subsetting datasets
  • Selecting and dropping variables

In chapter 2, we covered various methods for importing data into R. Unfortunately, getting your data in the rectangular arrangement of a matrix or data frame is only the first step in preparing it for analysis. To paraphrase Captain Kirk in the Star Trek episode “A Taste of Armageddon” (and proving my geekiness once and for all), “Data is a messy business—a very, very messy business.” In my own work, as much as 60% of any data analysis project is spent cleaning and organizing the data. I’ll go out on a limb and say that the same is probably true for most real-world data analysts. Let’s take a look at an example.

3.1 A working example

One of the topics that I study in my current job is how men and women differ in the ways they lead their organizations. Typical questions might be

  • Do men and women in management positions differ in the degree to which they defer to superiors?
  • Does this vary from country to country, or are these gender differences universal?

One way to address these questions is to have bosses in multiple countries rate their managers on deferential behavior, using questions like the following.

This manager asks my opinion before making personnel decisions.

1

2

3

4

5

strongly disagree

disagree

neither agree nor disagree

agree

strongly agree

3.2 Creating new variables

3.3 Recoding variables

3.4 Renaming variables

3.5 Missing values

sitemap