chapter nine

9 The GroupBy Object

 

This chapter covers:

  • Using a GroupBy object to store multiple DataFrames
  • Extracting first and last rows from each DataFrame in a GroupBy object
  • Performing aggregate operations on groups
  • Iterating over each DataFrame in a group

When working with a dataset, an analyst may want to isolate groups of rows based on common values in a column. Once these segments have been identified, it becomes easier to perform an aggregate analysis on each group, each collection of related rows. The pandas library's GroupBy object provides a simple but effective way to split an existing DataFrame into multiple, smaller datasets. It creates a storage container for these DataFrames and provides a powerful set of methods to analyze them.

9.1       Creating a GroupBy Object from Scratch

Let's create a new Jupyter Notebook and import the pandas library.

In  [1] import pandas as pd

We'll kick things off with a small example and explain more of the technical details in the next section. Let's create a simple DataFrame. It will consist of 5 rows of fruit and vegetable prices for a supermarket. Each item will be classified as either a fruit or a vegetable.

9.2       Creating a GroupBy Object from Dataset

9.3       Attributes and Methods on a GroupBy Object

9.4       Aggregate Operations

9.5       Applying an Operation to all Groups

9.6       Grouping by Multiple Columns

9.7       Coding Challenge

9.8       Summary