chapter four

4 The DataFrame Object

 

This chapter covers:

  • Instantiating a DataFrame object from a dictionary and a numpy ndarray
  • Importing a multidimensional dataset with the read_csv method
  • Sorting one or more columns in a DataFrame
  • Accessing rows and columns from a DataFrame
  • Setting and resetting the index of a DataFrame
  • Renaming column and index values

4.1   Overview of a DataFrame

The workhorse of the Pandas library, the DataFrame is a 2-dimensional data structure consisting of rows and columns. Two points of reference are needed to extract any given value from the dataset. A DataFrame can be described as a grid or a table of data, similar to one you'd find in a spreadsheet application like Excel.

4.1.1   Creating A DataFrame from a Dictionary

As always, let's begin by importing Pandas. We'll also be using the NumPy library for some random data generation. It is commonly assigned the alias np.

In  [1] import pandas as pd
        import numpy as np

Before we import our first dataset, let's practice instantiating a DataFrame from some native Python objects. One suitable data structure is a dictionary; its keys will serve as the column names and the corresponding values will serve as that column's values.

4.1.2   Creating A DataFrame from a Numpy ndarray

4.2   Similarities between Series and DataFrames

4.2.1   Importing a CSV File with the read_csv Method

4.2.2   Shared and Exclusive Attributes between Series and DataFrames

4.2.3   Shared Methods between Series and DataFrames

4.3   Sorting a DataFrame

4.3.1   Sort by Single Column

4.3.2   Sort by Multiple Columns

4.4   Sort by Index

4.4.1   Sort by Row Index

4.4.2   Sort by Column Index

4.5   Setting a New Index

4.6   Selecting Columns or Rows from a DataFrame

4.6.1   Select a Single Column from a DataFrame

4.6.2   Select Multiple Columns from a DataFrame

4.7   Select Rows from a DataFrame

4.7.1   Extract Rows by Index Label

4.7.2   Extract Rows by Index Position

4.7.3   Extract Values from Specific Columns

4.8   Extract Value from Series

4.9   Rename Column or Row

4.10   Resetting an Index

4.11   Coding Challenge

4.12   Summary