4 The DataFrame object

 

This chapter covers

  • Instantiating DataFrame objects from dictionaries and NumPy ndarrays
  • Importing DataFrames from CSV files with the read_csv function
  • Sorting DataFrame columns
  • Accessing rows and columns in a DataFrame
  • Setting and resetting a DataFrame index
  • Renaming columns and index labels in a DataFrame

The pandas DataFrame is a two-dimensional table of data with rows and columns. As with a Series, pandas assigns an index label and an index position to each DataFrame row. Pandas also assigns a label and a position to each column. The DataFrame is two-dimensional because it requires two points of reference—a row and a column—to isolate a value from the data set. Figure 4.1 displays a visual example of a pandas DataFrame.

Figure 4.1 A visual representation of a pandas DataFrame with five rows and two columns

The DataFrame is the workhorse of the pandas library and the data structure you’ll be working with most on a daily basis, so we’ll be spending the remainder of this book exploring its vast features.

4.1 Overview of a DataFrame

As always, let’s spin up a new Jupyter Notebook and import pandas. We also need the NumPy library, which we’ll use in section 4.1.2 to generate random data. NumPy is usually assigned the alias np:

In  [1] import pandas as pd
        import numpy as np

4.1.1 Creating a DataFrame from a dictionary

4.1.2 Creating a DataFrame from a NumPy ndarray

4.2 Similarities between Series and DataFrames

4.2.1 Importing a DataFrame with the read_csv function

4.2.2 Shared and exclusive attributes of Series and DataFrames

4.2.3 Shared methods of Series and DataFrames

4.3 Sorting a DataFrame

4.3.1 Sorting by a single column

4.3.2 Sorting by multiple columns

4.4 Sorting by index

4.4.1 Sorting by row index

4.4.2 Sorting by column index

4.5 Setting a new index