Appendix B. Python Pandas DataFrame

 

This chapter covers

  • An overview of the Pandas DataFrame
  • The DataFrame methods used in this book

This appendix describes an overview of the Pandas DataFrame and the methods used in this book.

B.1 An Overview of Pandas DataFrame

Python Pandas is a data manipulation, analysis, and visualization library. It provides easy tools to load, manipulate, analyze, and visualize data. In this book, we use the Pandas DataFrame, a two-dimensional structure composed of rows and columns. The DataFrame stores data in a tabular form, enabling you to manipulate, analyze, filter, and aggregate data quickly and easily.

There are different ways to create a Pandas DataFrame. In this book, we consider two ways: from a Python dictionary and from a CSV file. You can download the code described in this appendix from the GitHub repository of the book, under AppendixB/Pandas DataFrame.ipynb.

B.1.1 Building from a Dictionary

Consider the following example, which creates a Pandas DataFrame from a Python dictionary:

Listing B.1 Creating a DataFrame from a dictionary
import pandas as pd
 
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'BirthDate': ['2000-01-30', '2001-02-03', '2001-04-05'],
    'MathsScore': [90, 85, None],
    'PhysicsScore': [87, 92, 89],
    'ChemistryScore': [92, None, 90],
    'Grade' : ['A', 'B', 'A']
} #A
 
df = pd.DataFrame(data) #B
 
df['BirthDate'] = pd.to_datetime(df['BirthDate'], format='%Y-%m-%d') #C

B.1.2 Building from a CSV file

B.2 dt

B.3 groupby()

B.4 isnull()

B.5 melt()

B.6 unique()