1 Introducing pandas


This chapter covers

  • The growth of data science in the 21st century
  • The history of the pandas library for data analysis
  • The pros and cons of pandas and its competitors
  • Data analysis in Excel versus data analysis with a programming language
  • A tour of the library’s features through a working example

Welcome to Pandas in Action! Pandas is a library for data analysis built on top of the Python programming language. A library (also called a package) is a collection of code for solving problems in a specific field of endeavor. Pandas is a toolbox for data manipulation operations: sorting, filtering, cleaning, deduping, aggregating, pivoting, and more. The epicenter of Python’s vast data science ecosystem, pandas pairs well with other libraries for statistics, natural language processing, machine learning, data visualization, and more.

In this introductory chapter, we’ll explore the history and evolution of modern data analytics tools. We’ll see how pandas grew from one financial analyst’s pet project to an industry standard used by companies such as Stripe, Google, and J.P. Morgan. We’ll compare the library with its competitors, including Excel and R. We’ll discuss the differences between working with a programming language and working with a graphical spreadsheet application. Finally, we’ll use pandas to analyze a real-world data set. Consider this chapter to be a sneak preview of the concepts you’ll master throughout the book. Let’s dive in!

1.1 Data in the 21st century

1.2 Introducing pandas

1.2.1 Pandas vs. graphical spreadsheet applications

1.2.2 Pandas vs. its competitors

1.3 A tour of pandas

1.3.1 Importing a data set

1.3.2 Manipulating a DataFrame

1.3.3 Counting values in a Series

1.3.4 Filtering a column by one or more criteria

1.3.5 Grouping data