3 Getting bigger and leveraging the Big 3: Amazon, Microsoft Azure, and Google

 

This chapter covers

  • Designing a flexible and scalable six-layer data platform architecture
  • Understanding how layers support both batch and streaming data
  • Ensuring the right foundational components for easier management
  • Implementing a modern cloud data platform in AWS, Google, or Azure

Chapter 2 covered setting up a simple data platform made up of a data lake and a data warehouse in the cloud, with simple batch pipelines to ingest data. It also laid out the pros and cons of a data lake versus a data warehouse versus a combination of the two to produce the best analysis outcomes.

In this chapter, we’ll build on the data platform architecture concepts introduced in chapters 1 and 2, and we’ll layer on top of those some of the critical and more advanced functionality needed for most data platforms today. Without this added layer of sophistication, your data platform would work, but it wouldn’t scale easily, nor would it meet the growing data velocity challenges discussed in chapter 1. It would also be limited in terms of the types of data consumers (people and systems who consume the data from the platform) it supports, as they too are growing in both numbers and variety.

3.1 Cloud data platform layered architecture

3.1.1 Data ingestion layer

3.1.2 Fast and slow storage

3.1.3 Processing layer

3.1.4 Technical metadata layer

3.1.5 The serving layer and data consumers

3.1.6 Orchestration and ETL overlay layers

3.2 The importance of layers in a data platform architecture

3.3 Mapping cloud data platform layers to specific tools

3.3.1 AWS

3.3.2 Google Cloud

3.4.3 Orchestration layer

sitemap