Whenever we analyze data, the first thing we should do is look at it. For each variable, what are the most common values? How much variability is present? Are there any unusual observations? R provides a wealth of functions for visualizing data. In this chapter, we’ll look at graphs that help you understand a single categorical or continuous variable. This topic includes
- Visualizing the distribution of a variable
- Comparing the distribution of a variable across two or more groups
In both cases, the variable can be continuous (for example, car mileage as miles per gallon) or categorical (for example, treatment outcome as none, some, or marked). In later chapters, we’ll explore graphs that display more complex relationships among variables.
The following sections explore the use of bar charts, pie charts, tree maps, histograms, kernel density plots, box plots, violin plots, and dot plots. Some of these may be familiar to you, whereas others (such as tree charts or violin plots) may be new. The goal, as always, is to understand your data better and to communicate this understanding to others. Let’s start with bar charts.