7 MultiIndex DataFrames

 

This chapter covers

  • Creating a MultiIndex
  • Selecting rows and columns from a MultiIndex DataFrame
  • Extracting a cross-section from a MultiIndex DataFrame
  • Swapping MultiIndex levels

So far on our pandas journey, we’ve explored the one-dimensional Series and the two-dimensional DataFrame. The number of dimensions is the number of reference points we need to extract a value from a data structure. We need only one label or one index position to locate a value in a Series. We need two reference points to locate a value in a DataFrame: a label/index for the rows and a label/index for the columns. Can we expand beyond two dimensions? Absolutely! Pandas supports data sets with any number of dimensions through the use of a MultiIndex.

A MultiIndex is an index object that holds multiple levels. Each level stores a value for the row. It is optimal to use a MultiIndex when a combination of values provides the best identifier for a row of data. Consider the data set in figure 7.1, which stores stock prices across multiple dates.

Figure 7.1 Sample data set with Stock, Date, and Price columns

7.1 The MultiIndex object

7.2 MultiIndex DataFrames

7.3 Sorting a MultiIndex

7.4 Selecting with a MultiIndex

7.4.1 Extracting one or more columns

7.4.2 Extracting one or more rows with loc

7.4.3 Extracting one or more rows with iloc

7.5 Cross-sections

7.6 Manipulating the Index

7.6.1 Resetting the index

7.6.2 Setting the index

7.7 Coding challenge

7.7.1 Problems

7.7.2 Solutions

Summary