appendix B Python pandas DataFrame
This appendix describes an overview of the pandas DataFrame and the methods used in this book.
B.1 An overview of pandas DataFrame
Python pandas is a data manipulation, analysis, and visualization library. It provides tools to load and allow you to manipulate, analyze, and visualize data. In this book, we use the pandas DataFrame, a two-dimensional structure composed of rows and columns. The DataFrame stores data in a tabular form, enabling you to manipulate, analyze, filter, and aggregate data quickly and easily.
There are different ways to create a pandas DataFrame. In this book, we consider two ways: from a Python dictionary and from a CSV file. You can download the code described in this appendix from the GitHub repository for the book under AppendixB/Pandas DataFrame.ipynb.
B.1.1 Building from a dictionary
Listing B.1 Creating a DataFrame from a dictionary
import pandas as pd data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'BirthDate': ['2000-01-30', '2001-02-03', '2001-04-05'], 'MathsScore': [90, 85, None], 'PhysicsScore': [87, 92, 89], 'ChemistryScore': [92, None, 90], 'Grade' : ['A', 'B', 'A'] } #1 df = pd.DataFrame(data) #2 df['BirthDate'] = pd.to_datetime(df['BirthDate'], format='%Y-%m-%d') #3
Note Use DataFrame()
to create a new DataFrame from a dictionary.