1 Introducing data and the R language

 

This chapter covers

  • Why data analysis is important
  • How to make your analysis robust
  • How and why R works with data
  • RStudio: Your interface to R

You have your data, and you want to start doing something awesome with it, right? Brilliant! I promise you, we’ll get to that as soon as we can. But first, let’s take a step back. Telling you to dive right in now would be like handing you a pile of different timbers, pointing you toward the workshop, and telling you to make some furniture. It’s a good idea to first understand both the materials and the tools you’re about to use.

We’ll go through what data means in general — to you and to those who may potentially inherit your data — because if you don’t fully comprehend what you already have, then building on that won’t be useful (and at worst will be flat out wrong). Poorly preparing data merely delays dealing with it properly and grows your technical debt (making things easier now, but later making it necessary to pay back that time when you have difficulties working with poorly formed data).

We’ll discuss how to set yourself up for a rigorous analysis (one that can be repeated) and then begin working with one of the best data analysis tools available: the R programming language. For now, let’s go through what it means to “have some data.”

1.1 Data: What, where, how?

1.1.1 What is data?

1.1.2 Seeing the world as data sources

1.1.3 Data munging

1.1.4 What you can do with well-handled data

1.1.5 Data as an asset

1.1.6 Reproducible research and version control

1.2 Introducing R

1.2.1 The origins of R

1.2.2 What R is and what it isn’t

1.3 How R works

1.4 Introducing RStudio

1.4.1 Working with R within RStudio

1.4.2 Built-in packages (data and functions)

1.4.3 Built-in documentation

1.4.4 Vignettes

1.5 Try it yourself

Terminology

Summary