9 The GroupBy object

 

This chapter covers

  • Splitting a DataFrame into groups by using the groupby method
  • Extracting first and last rows from groups in a GroupBy object
  • Performing aggregate operations on GroupBy groups
  • Iterating over DataFrames in a GroupBy object

The pandas library’s GroupBy object is a storage container for grouping DataFrame rows into buckets. It provides a set of methods to aggregate and analyze each independent group in the collection. It allows us to extract rows at specific index positions within each group. It also offers a convenient way to iterate over the groups of rows. There’s lots of power packed into a GroupBy object, so let’s see what it’s capable of doing.

9.1 Creating a GroupBy object from scratch

Let’s create a new Jupyter Notebook and import the pandas library:

In  [1] import pandas as pd

We’ll kick things off with a small example and dive into more of the technical details in section 9.2. Let’s begin by creating a DataFrame that stores the prices of fruits and vegetables in a supermarket:

9.2 Creating a GroupBy object from a data set

9.3 Attributes and methods of a GroupBy object

9.4 Aggregate operations

9.5 Applying a custom operation to all groups

9.6 Grouping by multiple columns

9.7 Coding challenge

9.7.1 Problems

9.7.2 Solutions

Summary