chapter eight

8 Reshaping and pivoting

 

This chapter covers:

  • The differences between wide and narrow data
  • Generating a pivot table from a DataFrame
  • Aggregating pivot table values by sum, average, count and more
  • Stacking and unstacking index levels in a DataFrame
  • Melting an aggregate DataFrame

As we've seen many times before, a dataset can arrive in a format that is not ideal for the analysis that we'd like to perform on it. One study even estimates that 80% of "analysis" consists of cleaning up data and contorting it into the proper shape[25].

Sometimes, issues can be confined to a specific column, row, or cell. For example, a column may have the wrong data type, a row may have missing values, and a cell may have incorrect character casing. Other times, there can be larger, structural problems with a dataset that extend beyond the data itself. As one example, the values may be stored in a format that makes it easy to extract a single row but difficult to perform an aggregate summary of the data, or vice versa.

To reshape a dataset means to manipulate it into a different shape, one that tells a new story that could not be properly told from its original presentation. Reshaping offers a new "view" or "perspective" on the data.

8.1       Wide vs. Narrow Data

8.2       Creating a Pivot Table from a DataFrame

8.2.1   The pivot_table Method

8.2.2   Additional Options for Pivot Tables

8.3       Stacking and Unstacking Index Levels

8.4       Melting a Dataset

8.4.1   Melting a Dataset

8.5       Exploding a List of Values

8.6       Coding Challenge

8.7       Summary