Chapter 11. Large datasets in the cloud with Amazon Web Services and S3

 

This chapter covers

  • Understanding distributed object storage in the cloud
  • Using the AWS web interface to set up buckets and upload objects
  • Working with the boto3 library to upload data to an S3 bucket

In chapters 710, we saw the power of the distributed frameworks in Hadoop and Spark. These frameworks can take advantage of clusters of computers to parallelize massive data processing tasks and complete them in short order. Most of us, however, don’t have access to physical compute clusters.

In contrast, we can all get access to compute clusters from cloud service providers such as Amazon, Microsoft, and Google. These cloud providers have platforms that we can use for storing and processing data, along with a variety of services that automate common tasks we may want to do. In this chapter, we’ll take the first step of analyzing big data in the cloud by uploading data to Amazon’s Simple Storage Service (S3). First, we’ll review the basics of S3; then we’ll create a bucket and upload an object using the browser-based AWS console; and finally we’ll upload several objects to a bucket with the boto3 software development kit.

11.1. AWS Simple Storage Service—A solution for large datasets

 
 

11.2. Storing data in the cloud with S3

 

11.3. Exercises

 
 

Summary

 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage