Part 5. Data processing and analytics

 

Large-scale data processing has become important ever since Big Data became a buzz word. As you might guess, processing and analyzing loads of data (measured in terabytes, petabytes, or more) is a complicated job. In this section we’ll explore some of the tools available on Google Cloud Platform that were designed to simplify this work.

We’ll start by looking at BigQuery, which allows you to query immense amounts of data quickly, and then move onto Cloud Dataflow where you can take your Apache Beam data-processing pipelines and execute them on Google’s infrastructure. Last, we’ll look at how you may want to communicate across lots of systems using Cloud Pub/Sub as the glue in your various data-processing jobs.