Chapter 12. MapReduce in the cloud with Amazon’s Elastic MapReduce

This chapter covers

Launching and configuring cloud compute clusters with Elastic MapReduce
Running Hadoop jobs in the cloud with mrjob
Distributed cloud machine learning with Spark

Throughout this book, we’ve been talking about the ability to scale code up. We started by looking at how to parallelize code locally; then we moved on to distributed computing frameworks; and finally, in chapter 11, we introduced cloud computing technologies. In this chapter, we’ll look at techniques we can use to work with data of any scale. We’ll see how to take the Hadoop and Spark frameworks we covered in the middle of the book (chapters 7 and 8 for Hadoop; chapters 7, 9, and 10 for Spark) and bring them into the cloud with Amazon Elastic MapReduce. We’ll start by looking at how to bring Hadoop into the cloud with mrjob—a framework for Hadoop and Python that we introduced in chapter 8. Then, we’ll look at bringing Spark and its machine learning capabilities into the cloud.

12.1. Running Hadoop on EMR with mrjob

In chapter 8, we reviewed two methods of working with Hadoop:

Hadoop Streaming— Which uses Python scripts for its mappers and reducers
mrjob— Which we can use to do Hadoop jobs using only Python code

Chapter 12. MapReduce in the cloud with Amazon’s Elastic MapReduce

This chapter covers

12.1. Running Hadoop on EMR with mrjob

12.2. Machine learning in the cloud with Spark on EMR

12.3. Exercises

Summary

Chapter 12. MapReduce in the cloud with Amazon’s Elastic MapReduce

This chapter covers

12.1. Running Hadoop on EMR with mrjob

12.2. Machine learning in the cloud with Spark on EMR

12.3. Exercises

Summary

Unable to load book!