1 Introducing Pandas
This chapter covers:
· The growth of data science in the 21st century
· The introduction of the pandas library for data analysis
· The advantages and disadvantages of pandas relative to its competitors
· The differences between working in Excel vs. a programming language
· The basics of the DataFrame and the Series, the two primary objects in pandas
· A tour of the library's features through a working example
Welcome to Pandas In Action! Pandas is a popular library for data analysis built on top of the Python programming language. A library is a collection of code designed to solve a specific but common business problem. You can think of Pandas as a digital toolbox that holds various tools for working with data. One piece of Python's vast data science ecosystem, pandas pairs well with other libraries for statistics, natural language processing, machine learning, visualization, and more.
In this chapter, we’ll take a look at the history and evolution of tools for working with big data. We’ll explore how pandas grew from one financial analyst’s pet project to an industry-standard used by companies like Netflix[1], Stripe[2], Google, Facebook and J.P. Morgan[3]. We'll compare the library's strengths and weaknesses with those of its competitors. Finally, we'll see what pandas is capable of by analyzing a real-world dataset; consider it a sneak peek of the concepts covered throughout the book.