concept MultiIndex in category pandas

appears as: MultiIndex, A MultiIndex, MultiIndex
Pandas in Action MEAP V07

This is an excerpt from Manning's book Pandas in Action MEAP V07.

Can we expand beyond two dimensions? Absolutely! Pandas provides support for datasets with any number of dimensions through the use of a MultiIndex. A MultiIndex is a special index object that consists of multiple levels or tiers. It is optimal to use a MultiIndex when a single column's value is insufficient to identify a single row. Instead, we can combine values across two or more columns to serve as index labels. A DataFrame can hold a MultiIndex in the row index, the column index, or both. The introduction of layers to an index adds a lot of complexity but also a lot of versatility in the way that a dataset can be sliced and diced. There's lots to explore so let's dive right in!

Now imagine this list of tuples serving as the index of a DataFrame. Instead of being referenced by a simple data type like a string, each row would be referenced by a tuple holding multiple elements within in. That's a good way to start thinking about the MultiIndex object in Pandas. It's a more complex index that is able to hold multiple tiers of data within it.  

We can actually create a MultiIndex object without attaching it to a data structure like a Series or DataFrame. The MultiIndex class is available as a top-level attribute on the pandas library, which is referenced by the alias pd in our Jupyter Notebook. The class includes a from_tuples class method that instantiates a MultiIndex from a list of tuples. Let's give it a shot with our addresses list from above.

In  [7] pd.MultiIndex.from_tuples(tuples = addresses)
 
Out [7] MultiIndex([( '8809 Flair Square',   'Toddside', 'IL', '37206'),
                    ('9901 Austin Street',   'Toddside', 'IL', '37206'),
                    ( '905 Hogan Quarter',   'Franklin', 'IL', '37206'),
                    (    '72 Savage Lane', 'Talkanooga', 'TN', '37341')],
                   )

Let's take a second look at the tuples. We can see that the index positions within them represent a consistent idea or value. For example, the values at index position 0 represent the street address and the values at index position 1 represent the city. In a MultiIndex, we can think of each level as a collection of related labels. We can assign a name to each of these levels by passing a list to the from_tuples method's names parameter. Below, we assign the list of ["Street", "City", "State", "Zip"].

In  [8] my_index = pd.MultiIndex.from_tuples(
            tuples = addresses,
            names = ["Street", "City", "State", "Zip"]
        )
 
        my_index
 
Out [8] MultiIndex([( '8809 Flair Square',   'Toddside', 'IL', '37206'),
                    ('9901 Austin Street',   'Toddside', 'IL', '37206'),
                    ( '905 Hogan Quarter',   'Franklin', 'IL', '37206'),
                    (    '72 Savage Lane', 'Talkanooga', 'TN', '37341')],
                    names=['Street', 'City', 'State', 'Zip'])

Now that we can visualize what a MultiIndex object stores, let's attach it to a DataFrame! One easy way to do so is via the index parameter in the DataFrame constructor. We passed this parameter a list of strings in earlier chapters, but we can also pass it any Pandas index object, such as the MultiIndex assigned to the my_index variable above. Because our MultiIndex has four tuples, we'll need to provide four rows of data.

In  [9] data = [
            ["A", "B+"],
            ["C+", "C"],
            ["D-", "A"],
            ["B-", "F"]
        ]
 
        columns = ["Schools", "Cost of Living"]
 
        area_grades = pd.DataFrame(data = data,
                                   index = my_index,
                                   columns = columns)
 
        area_grades
 
Out [9]
 
                                       Schools Cost of Living
Street          City       State Zip_________________________                        
8809 Flair S... Toddside   IL    37206       A             B+
9901 Austin ... Toddside   IL    37206      C+              C
905 Hogan Qu... Franklin   IL    37206      D-              A
72 Savage Lane  Talkanooga TN    37341      B-              F

We now have a DataFrame with a MultiIndex for its rows. Each row's label or reference point is a tuple holding four values: a street, a city, a state and a zip code.

sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest