The first step in any data analysis is creating a dataset containing the information to be studied in a format that meets your needs. In R, this task involves
- Selecting a data structure to hold your data
- Entering or importing your data into the data structure
Sections 2.1 and 2.2 of this chapter describe the wealth of structures that R can use to hold data. In particular, section 2.2 describes vectors, factors, matrices, data frames, lists, and tibbles. Familiarizing yourself with these structures (and the notation used to access elements within them) will help you tremendously in understanding how R works. You might want to take your time working through this section.
Section 2.3 covers the many methods for importing data into R. Data can be entered manually or imported from an external source. These data sources can include text files, spreadsheets, statistical packages, and database-management systems. For example, the data that I work with typically comes as comma-delimited text files or Excel spreadsheets. On occasion, though, I receive data as SAS and SPSS datasets or through connections to SQL databases. It’s likely that you’ll only have to use one or two of the methods described in this section, so feel free to choose those that fit your situation.