concept box plot in category R

appears as: box plot, The box plot, box plots, box plots, The box plots
R in Action, Second Edition: Data analysis and graphics with R

This is an excerpt from Manning's book R in Action, Second Edition: Data analysis and graphics with R.

Listing 3.4. Fine placement of figures in a graph

The following sections explore the use of bar plots, pie charts, fan charts, histograms, kernel density plots, box plots, violin plots, and dot plots. Some of these may be familiar to you, whereas others (such as fan plots or violin plots) may be new to you. The goal, as always, is to understand your data better and to communicate this understanding to others. Let’s start with bar plots.

where formula is a formula and dataframe denotes the data frame (or list) providing the data. An example of a formula is y ~ A, where a separate box plot for numeric variable y is generated for each value of categorical variable A. The formula y ~ A*B would produce a box plot of numeric variable y, for each combination of levels in categorical variables A and B.

Box plots are very versatile. By adding notch=TRUE, you get notched box plots. If two boxes’ notches don’t overlap, there’s strong evidence that their medians differ (Chambers et al., 1983, p. 62). The following code creates notched box plots for the mpg example:

boxplot(mpg ~ cyl, data=mtcars,
        notch=TRUE,
        varwidth=TRUE,
        col="red",
        main="Car Mileage Data",
        xlab="Number of Cylinders",
        ylab="Miles Per Gallon")

The col option fills the box plots with a red color, and varwidth=TRUE produces box plots with widths that are proportional to their sample sizes.

Figure 6.13. Notched box plots for car mileage vs. number of cylinders

Finally, you can produce box plots for more than one grouping factor. Listing 6.9 provides box plots for mpg versus the number of cylinders and transmission type in an automobile (see figure 6.14). Again, you use the col option to fill the box plots with color. Note that colors recycle; in this case, there are six box plots and only two specified colors, so the colors repeat three times.

Figure 6.14. Box plots for car mileage vs. transmission type and number of cylinders
Listing 6.9. Box plots for two crossed factors

From figure 6.14, it’s again clear that median mileage decreases with number of cylinders. For four- and six-cylinder cars, mileage is higher for standard transmissions. But for eight-cylinder cars, there doesn’t appear to be a difference. You can also see from the widths of the box plots that standard four-cylinder and automatic eight-cylinder cars are the most common in this dataset.

sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest