chapter two

2 Creating a dataset

 

This chapter covers

  • Exploring R data structures
  • Using data entry
  • Importing data
  • Annotating datasets

The first step in any data analysis is the creation of a dataset containing the information to be studied, in a format that meets your needs. In R, this task involves the following:

  • Selecting a data structure to hold your data
  • Entering or importing your data into the data structure

The first part of this chapter (sections 2.1–2.2) describes the wealth of structures that R can use to hold data. In particular, section 2.2 describes vectors, factors, matrices, data frames, and lists. Familiarizing yourself with these structures (and the notation used to access elements within them) will help you tremendously in understanding how R works. You might want to take your time working through this section.

The second part of this chapter (section 2.3) covers the many methods available for importing data into R. Data can be entered manually or imported from an external source. These data sources can include text files, spreadsheets, statistical packages, and database-management systems. For example, the data that I work with typically comes as comma delimited text files or EXCEL spreadsheets. On occasion, though, I receive data as SAS and SPSS datasets or through connections to SQL databases. It’s likely that you’ll only have to use one or two of the methods described in this section, so feel free to choose those that fit your situation.

2.1 Understanding datasets

2.2 Data structures

2.2.1 Vectors

2.2.2 Matrices

2.2.3 Arrays

2.2.4 Data frames

2.2.5 Factors

2.2.6 Lists

2.2.7 Tibbles

2.3 Data input

2.3.1 Entering data from the keyboard

2.3.2 Importing data from a delimited text file

2.3.3 Importing data from Excel

2.3.4 Importing data from XML

2.3.5 Importing data from the Web

2.3.6 Importing data from SPSS

2.3.7 Importing data from SAS

2.3.8 Importing data from Stata

2.3.9 Accessing database management systems (DBMSs)

2.3.10 Importing data via Stat/Transfer