6 Visualizing distributions

 

This chapter covers

  • Grouping data points into bins
  • Drawing a histogram
  • Comparing two distributions side by side with a pyramid chart
  • Calculating the quartiles of a dataset and generating box plots
  • Using violin plots to compare distributions of multiple categories

Visualizing distributions is a common request in data visualization. We use data distributions to assess how often data values occur within a specific bracket or the probability of data points appearing within a range.

In this chapter, we’ll study the distribution of salaries for data visualization practitioners based in the United States. The data behind the report we’ll build comes from the 2021 State of the Industry Survey hosted by the Data Visualization Society (DVS) (www.datavisualizationsociety.org). You can see this project in figure 6.1 or online at http://mng.bz/orvd.

For this project, we’ll start by building the most common representation of data distribution, a histogram, to visualize the salary of the survey’s 788 US-based and salaried respondents. We’ll then compare the wages of respondents identifying as women and men using two types of visualizations: a pyramid chart and box plots. The first one is handy for comparing two categories side by side. The latter offers an extra layer of information compared to histograms, revealing the quartiles and median of a dataset.

6.1 Binning data

6.2 Drawing a histogram

6.3 Creating a pyramid chart

6.4 Generating box plots

6.4.1 Calculating quartiles with the quantile scale

6.4.2 Positioning multiple box plots on a chart

6.4.3 The point scale

6.4.4 Drawing a box plot

6.5 Comparing distributions with violin plots

Summary