9 The GroupBy object
This chapter covers:
- Using a GroupBy object to store multiple DataFrames
- Extracting first and last rows from each DataFrame in a GroupBy object
- Performing aggregate operations on groups
- Iterating over each DataFrame in a group
When working with a dataset, an analyst may want to isolate groups of rows based on common values in a column. Once these segments have been identified, it becomes easier to perform an aggregate analysis on each group, each collection of related rows. The pandas library's GroupBy object provides a simple but effective way to split an existing DataFrame into multiple, smaller datasets. It creates a storage container for these DataFrames and provides a powerful set of methods to analyze them.
9.1 Creating a GroupBy Object from Scratch
Let's create a new Jupyter Notebook and import the pandas library.
In [1] import pandas as pd
We'll kick things off with a small example and explain more of the technical details in the next section. Let's create a simple DataFrame. It will consist of 5 rows of fruit and vegetable prices for a supermarket. Each item will be classified as either a fruit or a vegetable.