appendix B Python pandas DataFrame

 

This appendix describes an overview of the pandas DataFrame and the methods used in this book.

B.1 An overview of pandas DataFrame

Python pandas is a data manipulation, analysis, and visualization library. It provides tools to load and allow you to manipulate, analyze, and visualize data. In this book, we use the pandas DataFrame, a two-dimensional structure composed of rows and columns. The DataFrame stores data in a tabular form, enabling you to manipulate, analyze, filter, and aggregate data quickly and easily.

There are different ways to create a pandas DataFrame. In this book, we consider two ways: from a Python dictionary and from a CSV file. You can download the code described in this appendix from the GitHub repository for the book under AppendixB/Pandas DataFrame.ipynb.

B.1.1 Building from a dictionary

Consider the following listing, which creates a pandas DataFrame from a Python dictionary.

Listing B.1 Creating a DataFrame from a dictionary
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'BirthDate': ['2000-01-30', '2001-02-03', '2001-04-05'],
    'MathsScore': [90, 85, None],
    'PhysicsScore': [87, 92, 89],
    'ChemistryScore': [92, None, 90],
    'Grade' : ['A', 'B', 'A']
}                                  #1

df = pd.DataFrame(data)    #2

df['BirthDate'] = pd.to_datetime(df['BirthDate'], format='%Y-%m-%d')    #3

B.1.2 Building from a CSV file

B.2 dt

B.3 groupby()

B.4 isnull()

B.5 melt()

B.6 unique()