Introduction
There’s been a subtle revolution in computing as cloud providers—Amazon, Google and Microsoft—have, over the last decade and change, supplied increasingly cheap and increasingly dependable object storage. Unlike the file storage you may be familiar with on your local machine, this method of storage designates discrete units of data as objects and stores them in separate buckets along with their metadata. Your cloud service provider will handle most of the computing nuances, letting you focus on other parts of your application. This has made storage of large datasets such as massive movie and song libraries (like Netflix, Amazon, Google, Facebook, Twitter, Vine, Snapchat, and Spotify) not only possible, but straightforward. Indeed, object storage is in many ways the linchpin of the big data era we live in today.
With the goal of creating a primer on object storage and its many uses, I chose chapters from four great Manning books that take a look at object storage at all three major cloud providers: Amazon Web Services, Google Cloud Platform, and Microsoft Azure.
My aim with this booklet is for you to become comfortable developing with object storage regardless of the cloud provider. Additionally, exploring all three cloud providers’ object storage offerings will empower you as a developer to make your own decisions about which platform is most appropriate for your use case.
This booklet starts by looking at the Google Cloud Platform’s object storage: Google Cloud Storage, a system that The New York Times entrusts with much of their digital storage. This chapter comes from the book Google Cloud Platform in Action by JJ Geewax, and it provides an excellent introduction to the what, why, how, and when of cloud storage.
Next, with a chapter from Amazon Web Services in Action, Second Edition by Michael Wittig and Andreas Wittig, you’ll learn about Amazon Web Service’s object storage: Amazon S3, as well as some of the specifics of this highly popular distributed data store, like transferring files to S3 using the terminal and integrating S3 into your applications with SDKs. You’ll also gain first-hand experience building and serving a webpage using S3.
Then, we’ll look at Microsoft Azure’s object storage service: Azure Blob Storage. The Azure look and feel is my personal favorite, and Azure Blob Storage is trusted by Ubisoft, the game studio behind the critically acclaimed Assassin’s Creed and Far Cry series, for both game data and game log storage. This chapter, “Storing your objects: S3 and Glacier,” comes from the book Azure Data Engineering by Richard L. Nuckolls. Here you’ll have the opportunity to see the nuances of object storage in Azure.
Lastly, we’ll learn how to use our object storage for big data analytics with Amazon EMR, a service for running Hadoop and Spark jobs in the cloud. In this chapter, which comes from my own book, Mastering Large Datasets with Python, we’ll use object storage to power big data analytics and run both Hadoop and Spark jobs in the cloud using Amazon EMR.
By its nature, object storage has many benefits over other storage methods including superior data analytics, unlimited scalability, quick data retrieval, and cost reduction. This sampler will help you choose the right provider for your tasks and get you comfortable developing with object storage.