Appendix B. Python Pandas DataFrame
This chapter covers
- An overview of the Pandas DataFrame
- The DataFrame methods used in this book
This appendix describes an overview of the Pandas DataFrame and the methods used in this book.
B.1 An Overview of Pandas DataFrame
Python Pandas is a data manipulation, analysis, and visualization library. It provides easy tools to load, manipulate, analyze, and visualize data. In this book, we use the Pandas DataFrame, a two-dimensional structure composed of rows and columns. The DataFrame stores data in a tabular form, enabling you to manipulate, analyze, filter, and aggregate data quickly and easily.
There are different ways to create a Pandas DataFrame. In this book, we consider two ways: from a Python dictionary and from a CSV file. You can download the code described in this appendix from the GitHub repository of the book, under AppendixB/Pandas DataFrame.ipynb.
B.1.1 Building from a Dictionary
Consider the following example, which creates a Pandas DataFrame from a Python dictionary:
Listing B.1 Creating a DataFrame from a dictionary
import pandas as pd data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'BirthDate': ['2000-01-30', '2001-02-03', '2001-04-05'], 'MathsScore': [90, 85, None], 'PhysicsScore': [87, 92, 89], 'ChemistryScore': [92, None, 90], 'Grade' : ['A', 'B', 'A'] } #A df = pd.DataFrame(data) #B df['BirthDate'] = pd.to_datetime(df['BirthDate'], format='%Y-%m-%d') #C