Welcome to Pandas in Action! Pandas is a library for data analysis built on top of the Python programming language. A library (also called a package) is a collection of code for solving problems in a specific field of endeavor. Pandas is a toolbox for data manipulation operations: sorting, filtering, cleaning, deduping, aggregating, pivoting, and more. The epicenter of Python’s vast data science ecosystem, pandas pairs well with other libraries for statistics, natural language processing, machine learning, data visualization, and more.
In this introductory chapter, we’ll explore the history and evolution of modern data analytics tools. We’ll see how pandas grew from one financial analyst’s pet project to an industry standard used by companies such as Stripe, Google, and J.P. Morgan. We’ll compare the library with its competitors, including Excel and R. We’ll discuss the differences between working with a programming language and working with a graphical spreadsheet application. Finally, we’ll use pandas to analyze a real-world data set. Consider this chapter to be a sneak preview of the concepts you’ll master throughout the book. Let’s dive in!