appendix B Python pandas DataFrame
This appendix describes an overview of the pandas DataFrame and the methods used in this book.
B.1 An overview of pandas DataFrame
Python pandas is a data manipulation, analysis, and visualization library. It provides tools to load and allow you to manipulate, analyze, and visualize data. In this book, we use the pandas DataFrame, a two-dimensional structure composed of rows and columns. The DataFrame stores data in a tabular form, enabling you to manipulate, analyze, filter, and aggregate data quickly and easily.
There are different ways to create a pandas DataFrame. In this book, we consider two ways: from a Python dictionary and from a CSV file. You can download the code described in this appendix from the GitHub repository for the book under AppendixB/Pandas DataFrame.ipynb.
B.1.1 Building from a dictionary
Listing B.1 Creating a DataFrame from a dictionary
import pandas as pd
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'BirthDate': ['2000-01-30', '2001-02-03', '2001-04-05'],
    'MathsScore': [90, 85, None],
    'PhysicsScore': [87, 92, 89],
    'ChemistryScore': [92, None, 90],
    'Grade' : ['A', 'B', 'A']
}                                  #1
df = pd.DataFrame(data)    #2
df['BirthDate'] = pd.to_datetime(df['BirthDate'], format='%Y-%m-%d')    #3 
   
 Note  Use DataFrame() to create a new DataFrame from a dictionary.