2 Starting with R and data


This chapter works through how to start working with R and how to import data into R from diverse sources. This will prepare you to work examples throughout the rest of this book. In this chapter we will:

  • Start working with R and data
  • Master R’s data frame structure
  • Load data into R
  • Start re-coding data for later analysis.

Figure 2.1   is a diagram representing a mental model for this book that has been re-shaded to emphasize the purpose of chapter: starting to work with R and importing data into R. The overall diagram is the data science process diagram from chapter 1 combined with a rebus form of this book's title. In each chapter we will re-shade this mental model to indicate the parts of the data science process we are emphasizing. For example: in this chapter we are mastering the initial steps of collecting and managing data, and touching on issues of practicality, data, and R (but not yet the art of science).

Figure 2.1. Chapter 2 Mental Model
Chapter 2 Mental Model

Many data science projects start when someone points the analyst toward a bunch of data and the analyst is left to make sense of it.[5] A first thought may be to use ad-hoc tools and spreadsheets to sort through it, but you will quickly realize that you’re taking more time tinkering with the tools than actually analyzing the data. Luckily, there’s a better way: using R. By the end of the chapter, you’ll be able to confidently use R to extract, transform, and load data for analysis.

2.1  Starting with R

2.1.1  Installing R

2.1.2  R programming

2.2  Working with data from files

2.2.1  Working with well-structured data from files or URLs

2.2.2  Using R with less-structured data

2.3  Working with relational databases

2.3.1  A production-size example

2.4  Summary