Chapter 24. Exploring data


This chapter covers

  • Python’s advantages for handling data
  • Jupyter Notebook
  • pandas
  • Data aggregation
  • Plots with matplotlib

Over the past few chapters, I’ve dealt with some aspects of using Python to get and clean data. Now it’s time to look at a few of the things that Python can help you do to manipulate and explore data.

24.1. Python tools for data exploration

In this chapter, we’ll look at some common Python tools for data exploration: Jupyter notebook, pandas, and matplotlib. I can only touch briefly on a few features of these tools, but the aim is to give you an idea of what is possible and some initial tools to use in exploring data with Python.

24.1.1. Python’s advantages for exploring data

Python has become one of the leading languages for data science and continues to grow in that area. As I’ve mentioned, however, Python isn’t always the fastest language in terms of raw performance. Conversely, some data-crunching libraries, such as NumPy, are largely written in C and heavily optimized to the point that speed isn’t an issue. In addition, considerations such as readability and accessibility often outweigh pure speed; minimizing the amount of developer time needed is often more important. Python is readable and accessible, and both on its own and in combination with tools developed in the Python community, it’s an enormously powerful tool for manipulating and exploring data.

24.1.2. Python can be better than a spreadsheet

24.2. Jupyter notebook

24.3. Python and pandas

24.4. Data cleaning

24.5. Data aggregation and manipulation

24.6. Plotting data

24.7. Why you might not want to use pandas