4 Orchestration

This chapter covers

Building a data ingestion pipeline
Introducing Azure Data Factory
DevOps for Azure Data Factory
Monitoring with Azure Monitor

In this chapter, we’ll look at the final pieces of core infrastructure for our data platform: orchestration and monitoring. DevOps is where we store all our code and configurations and from which we deploy our services. The storage layer is where we ingest data and on top of which we run our workloads. The orchestration layer handles data movement and all other automated processing. Figure 4.1 highlights the platform layer we’ll focus on in this chapter.

Figure 4.1 The orchestration layer handles scheduling for all tasks and data movement into and out of the data platform.

We’ll start with a real-world scenario: ingesting the Bing COVID-19 Open Research Dataset into our data platform. Microsoft provides several open datasets for everyone’s use. One of these tracks COVID-19 cases. We’ll use Azure Data Factory (ADF) to create a pipeline to bring this dataset into an Azure Data Explorer (ADX) cluster.

4.1 Ingesting the Bing COVID-19 open dataset

4.2 Introducing Azure Data Factory

4.2.1 Setting up the data source

4.2.2 Setting up the data sink

4.2.3 Setting up the pipeline

4.2.4 Setting up a trigger

4.2.5 Orchestrating with Azure Data Factory

4.3 DevOps for Azure Data Factory

4.3.1 Deploying Azure Data Factory from Git

4.3.2 Setting up access control

4.3.3 Deploying the production data factory

4.3.4 DevOps for the Azure Data Factory recap

4.4 Monitoring with Azure Monitor

Summary