This is an online version of the Manning book Mastering Large Datasets with Python: Parallelize and Distribute Your Python Code. With liveBook you can access Manning books in-browser — anytime, anywhere.
Thanks for purchasing this book. I’m really excited about its contents and I’m even more excited that you’re excited about its contents.
This book was born out of several experiences I’ve had working with junior developers, software engineers, and data scientists, in academia and industry. In this book I try to address what I see as some of the largest sticking points for programmers moving from solo work to working on a team and from work on independent projects to work on industrial scale problems.
The tools and technologies introduced in this book—map, reduce, parallel programming, Hadoop, Spark, and cloud computing on AWS—are force multipliers for the programming skills you already have. After reading this book you should be able to take on Big Problems, whatever that means for the specific domain you develop in.
With the versatility of these tools and technologies in mind, the scenarios in this book are going to cover a wide range of tasks. As you follow along, you’ll also be exposed to web scraping, social-media mining, financial and scientific simulations, and machine learning. I hope that the variety of these applications will inspire you to employ map, reduce and parallel computing creatively in your own work.