7 Batch queries with Azure Data Lake Analytics

This chapter covers:

Writing job queries using U-SQL
Creating U-SQL jobs
Creating a Data Lake Analytics service
Estimating appropriate parallelization for U-SQL jobs

In the last chapter, you used Azure Stream Analytics as a source for raw data, using a passthrough query. The passthrough query takes the incoming data and passes it to the output, in this case files in an Azure Data Lake store (ADL).

Figure 7.1. Lambda architecture with Azure PaaS Speed layer

This is the latest example of prep work for batch processing, which includes loading files into storage and saving groups of messages into files. Azure Storage accounts, Data Lakes, and Event Hubs services set the base for building a batch processing analytics system in Azure. With files in the ADL, you’re ready to start doing batch processing.

7.1 U-SQL language

7.1.1 Extractors

7.1.2 Outputters

7.1.3 File selectors

7.1.4 Expressions

7.2 U-SQL jobs

7.2.1 Selecting the biometric data files

7.2.2 Schema extraction

7.2.3 Aggregation

7.2.4 Writing files

7.3 Creating a Data Lake Analytics service

7.3.1 Using Azure Portal

7.3.2 Using Azure PowerShell

7.4 Submitting jobs to ADLA

7.4.1 Using Azure Portal

7.4.2 Using Azure PowerShell

7.5 Efficient U-SQL job executions

7.5.1 Monitoring a U-SQL job

7.5.2 Analytics units

7.5.3 Vertexes

7.5.4 Scaling the job execution