chapter ten

10 Reshaping and Pivoting

 

This chapter covers:

  • The differences between wide and narrow data
  • Generating a pivot table from a DataFrame
  • Aggregating pivot table values by sum, average, count and more
  • Stacking and unstacking index levels in a DataFrame
  • Melting an aggregate DataFrame

As we've seen many times before, a dataset can arrive in a format that is not ideal for the analysis that we'd like to perform on it. One study even estimates that 80% of "analysis" consists of cleaning up data and contorting it into the proper shape[1].

Sometimes, issues can be confined to a specific column, row, or cell. For example, a column may have the wrong data type, a row may have missing values, and a cell may have incorrect character casing. Other times, there can be larger, structural problems with a dataset that extend beyond the data itself. As one example, the values may be stored in a format that makes it easy to extract a single row but difficult to perform an aggregate summary of the data, or vice versa.

To reshape a dataset means to manipulate it into a different shape, one that tells a new story that could not be properly told from its original presentation. Reshaping offers a new "view" or "perspective" on the data.

10.1       Wide vs. Narrow Data

10.2       Creating a Pivot Table from a DataFrame

10.2.1   The pivot_table Method

10.2.2   Additional Options for Pivot Tables

10.3       Stacking and Unstacking Index Levels

10.4       Melting a Dataset

10.4.1   Melting a Dataset

10.5       Exploding a List of Values

10.6       Coding Challenge

10.7       Summary