Part 2. Basic methods

 

In part 1, we explored the R environment and discussed how to input data from various sources, combine and transform it, and prepare it for further analyses. Once your data has been inputted and cleaned up, the next step is typically to explore the variables one at a time. This provides you with information about the distribution of each variable, which is useful in understanding the characteristics of the sample, identifying unexpected or problematic values, and selecting appropriate statistical methods. Next, variables are typically studied two at a time. This can help you to uncover basic relationships among variables and is a useful first step in developing more complex models.

Part 2 focuses on graphical and statistical techniques for obtaining basic information about data. Chapter 6 describes methods for visualizing the distribution of individual variables. For categorical variables, this includes bar plots, pie charts, and the newer tree maps. For numeric variables, this includes histograms, density plots, box plots, dot plots, and the less-well-known violin plot. Each type of graph is useful for understanding the distribution of a single variable.