1 Introducing Pandas
This chapter covers:
- The role of data in the 21st century
- Popular solutions for data analysis including pandas, Excel, R, and SAS
- The advantages and disadvantages of pandas relative to its competitors
Welcome to Pandas In Action! Pandas is a popular library for data analysis built on top of the Python programming language. A library is a collection of code designed to solve a specific problem. One piece of Python's vast data science ecosystem, pandas pairs well with libraries for statistics, nature language processing, machine learning, visualization and more.
In this chapter, we’ll take a look at the history and evolution of tools for working with big data. We’ll explore how pandas grew from one financial analyst’s pet project to an industry standard used by companies like Netflix[1], Stripe[2], Google, Facebook and J.P. Morgan[3]. Finally, we'll compare the library with its competitors and analyze its relative strengths and weaknesses.
1.1 Data in the 21st Century
"It is a capital mistake to theorize before one has data," advises Sherlock Holmes to his assistant John Watson in A Scandal in Bohemia, the first of Sir Arthur Conan Doyle’s classic short stories pairing the duo. "Insensibly one begins to twist facts to suit theories, instead of theories to suit facts."[4]