10 Reshaping and Pivoting
This chapter covers:
- The differences between wide and narrow data
- Generating a pivot table from a DataFrame
- Aggregating pivot table values by sum, average, count and more
- Stacking and unstacking index levels in a DataFrame
- Melting an aggregate DataFrame
As we've seen many times before, a dataset can arrive in a format that is not ideal for the analysis that we'd like to perform on it. One study even estimates that 80% of "analysis" consists of cleaning up data and contorting it into the proper shape[1].
Sometimes, issues can be confined to a specific column, row, or cell. For example, a column may have the wrong data type, a row may have missing values, and a cell may have incorrect character casing. Other times, there can be larger, structural problems with a dataset that extend beyond the data itself. As one example, the values may be stored in a format that makes it easy to extract a single row but difficult to perform an aggregate summary of the data, or vice versa.
To reshape a dataset means to manipulate it into a different shape, one that tells a new story that could not be properly told from its original presentation. Reshaping offers a new "view" or "perspective" on the data.