9 MultiIndex DataFrames
This chapter covers:
- Creating a MultiIndex object with multiple levels of data
- Indexing by label or position in a MultiIndex DataFrame
- Extracting a cross-section of a MultiIndex DataFrame
- Manipulating the index of a MultiIndex DataFrame
So far on our journey, we've explored the 1-dimensional Series and the 2-dimensional DataFrame objects. Dimensions refer to the number of points of reference that are required to extract a single value from a data structure. A Series needs only one label or index position to find a value. A DataFrame requires two points of reference: a label or index for the rows and a label or index for the columns.
Can we expand beyond two dimensions? Absolutely! Pandas provides support for datasets with any number of dimensions through the use of a MultiIndex. A MultiIndex is a special index object that consists of multiple levels or tiers. It is optimal to use a MultiIndex when a single column's value is insufficient to identify a single row. Instead, we can combine values across two or more columns to serve as index labels. A DataFrame can hold a MultiIndex in the row index, the column index, or both. The introduction of layers to an index adds a lot of complexity but also a lot of versatility in the way that a dataset can be sliced and diced. There's lots to explore so let's dive right in!
9.1 The MultiIndex Object
As always, let's open up a fresh Jupyter Notebook, import the pandas library, and assign it to a pd alias.