chapter ten

10 Service integration with Azure Data Factory

This chapter covers:

Building a single-step processing pipeline
Using a secret key store
Scheduling batch data processing

In previous chapters, you’ve learned how to use Azure services to ingest and transform data. Except for Stream Analytics (SA), which automatically processes incoming data, you have added the data manually or triggered a process manually. In this chapter, you’ll learn how to move data between services on a schedule. You’ll learn how to move files between Azure Storage accounts and your Azure Data Lake (ADL). You’ll also learn how to run U-SQL scripts on a schedule to transform data. You’ll use Azure Data Lake Analytics (ADLA) to read and transform data from multiple sources. You’ll learn how to store secrets in Azure Key Vault (AKV). Azure Data Factory (ADF) provides the connections that powers this automation.

ADF manages execution of tasks. These tasks can be as simple as calling a web service endpoint, or as complicated as creating a new server cluster to run custom code and removing it once the code completes. T Each task consists of a JSON definition. Tasks and relationships are defined as follows ( “A”):

Each task is called an activity.
Activities connect to external services using a linkedservice.

One or more activities connect to form a pipeline.
One or more pipelines form a data factory.

Figure 10.1. Data Factory moves data

10.1 Creating an Azure Data Factory

10.2 Secure authentication

10.2.1 Azure Active Directory integration

10 Service integration with Azure Data Factory

This chapter covers:

Figure 10.1. Data Factory moves data

10.1 Creating an Azure Data Factory

10.2 Secure authentication

10.2.1 Azure Active Directory integration

10.2.2 Azure Key Vault

10.3 Copying files with ADF

10.3.1 Creating a Files storage container

10.3.2 Add secret to AKV

10.3.3 Creating a Files storage linkedservice

10.3.4 Creating an ADL linkedservice

10.3.5 Creating a pipeline and activity

10.3.6 Creating a scheduled trigger

10.4 Running an ADLA job

10.4.1 Creating an ADLA linkedservice

10.4.2 Creating a pipeline and activity

10.5 Exercises