Chapter 9. Running Hadoop in the cloud

This chapter covers

Setting up a compute cloud with Amazon Web Services (AWS)
Running Hadoop in the AWS cloud
Transferring data into and out of an AWS Hadoop cloud

Depending on your data processing needs, your Hadoop workload can vary widely over time. You may have a few large data processing jobs that occasionally take advantage of hundreds of nodes, but those same nodes will sit idle the rest of the time. You may be new to Hadoop and want to get familiar with it first before investing in a dedicated cluster. You may own a startup that needs to conserve cash and wants to avoid the capital expense of a Hadoop cluster. In these and other situations, it makes more sense to rent a cluster of machines rather than buy it.

9.1. Introducing Amazon Web Services

Chapter 9. Running Hadoop in the cloud

This chapter covers

9.1. Introducing Amazon Web Services

9.2. Setting up AWS

9.3. Setting up Hadoop on EC2

9.4. Running MapReduce programs on EC2

9.5. Cleaning up and shutting down your EC2 instances

9.6. Amazon Elastic MapReduce and other AWS services

9.7. Summary