8 Reshaping and pivoting

 

This chapter covers

  • Comparing wide and narrow data
  • Generating a pivot table from a DataFrame
  • Aggregating values by sum, average, count, and more
  • Stacking and unstacking DataFrame index levels
  • Melting a DataFrame

A data set can arrive in a format unsuited for the analysis that we’d like to perform on it. Sometimes, issues are confined to a specific column, row, or cell. A column may have the wrong data type, a row may have missing values, or a cell may have incorrect character casing. At other times, a data set may have larger structural problems that extend beyond the data. Perhaps the data set stores its values in a format that makes it easy to extract a single row but difficult to aggregate the data.

Reshaping a data set means manipulating it into a different shape, one that tells a story that could not be gleaned from its original presentation. Reshaping offers a new view or perspective on the data. This skill is critical; one study estimates that 80% of data analysis consists of cleaning up data and contorting it into the proper shape. 1

1 See Hadley Wickham, “Tidy Data,” Journal of Statistical Software, https://vita.had.co.nz/papers/tidy-data.pdf.

8.1 Wide vs. narrow data

8.2 Creating a pivot table from a DataFrame

8.2.1 The pivot_table method

8.2.2 Additional options for pivot tables

8.3 Stacking and unstacking index levels

8.4 Melting a data set

8.5 Exploding a list of values

8.6 Coding challenge

8.6.1 Problems

8.6.2 Solutions

Summary