Chapter 3 Exploring data

 

Chapter 24 from The Quick Python Book, 3rd edition by Naomi Ceder.

This chapter covers:

    Python’s advantages for handling data

    Jupyter Notebook

    pandas

    Data aggregation

    Plots with matplotlib

    Over the past few chapters, I’ve dealt with some aspects of using Python to get and clean data. Now it’s time to look at a few of the things that Python can help you do to manipulate and explore data.

    24.1 Python tools for data exploration

    In this chapter, we’ll look at some common Python tools for data exploration: Jupyter notebook, pandas, and matplotlib. I can only touch briefly on a few features of these tools, but the aim is to give you an idea of what is possible and some initial tools to use in exploring data with Python.

    24.1.1 Python’s advantages for exploring data

    Python has become one of the leading languages for data science and continues to grow in that area. As I’ve mentioned, however, Python isn’t always the fastest language in terms of raw performance. Conversely, some data-crunching libraries, such as NumPy, are largely written in C and heavily optimized to the point that speed isn’t an issue. In addition, considerations such as readability and accessibility often outweigh pure speed; minimizing the amount of developer time needed is often more important. Python is readable and accessible, and both on its own and in combination with tools developed in the Python community, it’s an enormously powerful tool for manipulating and exploring data.

    24.1.2 Python can be better than a spreadsheet

    24.2 Jupyter notebook

    24.2.1 Starting a kernel

    24.3  Python and pandas

    24.3.1 Why you might want to use pandas 

    24.3.2 Installing pandas 

    24.3.3 Data frames

    24.4  Data cleaning

    24.4.1   Loading and saving data with pandas 

    24.4.2 Data cleaning with a data frame

    24.5  Data aggregation and manipulation

    24.5.2 Selecting data

    24.5.3 Grouping and aggregation

    24.6  Plotting data

    24.7  Why you might not want to use pandas

    Summary