Part 2 Working with structured data using Dask DataFrames

 

Now that you have a basic understanding of how Dask makes it possible to both work with large datasets and take advantage of parallelism, you’re ready to get some hands-on experience working with a real dataset to learn how to solve common data science challenges with Dask. Part 2 focuses on Dask DataFrames—a parallelized implementation of the ever-popular Pandas DataFrame—and how to use them to clean, analyze, and visualize large structured datasets.

Chapter 3 opens the part by explaining how Dask parallelizes Pandas DataFrames and describing why some parts of the Dask DataFrame API are different from its Pandas counterpart.

Chapter 4 jumps into the first part of the data science workflow by addressing how to read data into DataFrames from various data sources.

Chapter 5 continues the workflow by diving into common data manipulation and cleaning tasks, such as sorting, filtering, recoding, and filling in missing data.

Chapter 6 demonstrates how to generate descriptive statistics using some built-in functions, as well as how to build your own custom aggregate and window functions.

Chapters 7 and 8 close out part 2 by taking you from basic visualizations through advanced, interactive visualizations, even plotting location-based data on a map.

sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage