chapter nine

9 MultiIndex DataFrames

 

This chapter covers:

  • Creating a MultiIndex object with multiple levels of data
  • Indexing by label or position in a MultiIndex DataFrame
  • Extracting a cross-section of a MultiIndex DataFrame
  • Manipulating the index of a MultiIndex DataFrame

So far on our journey, we've explored the 1-dimensional Series and the 2-dimensional DataFrame objects. Dimensions refer to the number of points of reference that are required to extract a single value from a data structure. A Series needs only one label or index position to find a value. A DataFrame requires two points of reference: a label or index for the rows and a label or index for the columns.

Can we expand beyond two dimensions? Absolutely! Pandas provides support for datasets with any number of dimensions through the use of a MultiIndex. A MultiIndex is a special index object that consists of multiple levels or tiers. It is optimal to use a MultiIndex when a single column's value is insufficient to identify a single row. Instead, we can combine values across two or more columns to serve as index labels. A DataFrame can hold a MultiIndex in the row index, the column index, or both. The introduction of layers to an index adds a lot of complexity but also a lot of versatility in the way that a dataset can be sliced and diced. There's lots to explore so let's dive right in!

9.1      The MultiIndex Object

As always, let's open up a fresh Jupyter Notebook, import the pandas library, and assign it to a pd alias.

9.2      MultiIndex DataFrames

9.3      Sorting A MultiIndex

9.4      Indexing with a MultiIndex

9.4.1   Extracting One or More Columns

9.4.2   Extracting One or More Rows with loc

9.4.3   Extracting One or More Rows with iloc

9.5      Cross Sections

9.6      Manipulating the Index

9.6.1   Resetting the Index

9.6.2   Setting the Index

9.7      Summary