chapter eight

8 Reshaping and pivoting

This chapter covers:

The differences between wide and narrow data
Generating a pivot table from a DataFrame
Aggregating pivot table values by sum, average, count and more
Stacking and unstacking index levels in a DataFrame
Melting an aggregate DataFrame

As we've seen many times before, a dataset can arrive in a format that is not ideal for the analysis that we'd like to perform on it. One study even estimates that 80% of "analysis" consists of cleaning up data and contorting it into the proper shape[25].

Sometimes, issues can be confined to a specific column, row, or cell. For example, a column may have the wrong data type, a row may have missing values, and a cell may have incorrect character casing. Other times, there can be larger, structural problems with a dataset that extend beyond the data itself. As one example, the values may be stored in a format that makes it easy to extract a single row but difficult to perform an aggregate summary of the data, or vice versa.

To reshape a dataset means to manipulate it into a different shape, one that tells a new story that could not be properly told from its original presentation. Reshaping offers a new "view" or "perspective" on the data.

8.1 Wide vs. Narrow Data

8.2 Creating a Pivot Table from a DataFrame

8 Reshaping and pivoting

This chapter covers:

8.1 Wide vs. Narrow Data

8.2 Creating a Pivot Table from a DataFrame

8.2.1 The pivot_table Method

8.2.2 Additional Options for Pivot Tables

8.3 Stacking and Unstacking Index Levels

8.4 Melting a Dataset

8.4.1 Melting a Dataset

8.5 Exploding a List of Values

8.6 Coding Challenge

8.7 Summary