In the last chapter, you used Azure Stream Analytics as a source for raw data, using a passthrough query. The passthrough query takes incoming data and passes it to the output, in this case files in Azure Data Lake Storage (ADLS). Figure 7.1 shows this use of Stream Analytics in parallel with the serving layer.
This is the latest example of prep work for batch processing, which includes loading files into storage and saving groups of messages into files. Azure Storage accounts, Data Lakes, and Event Hubs services set the base for building a batch processing analytics system in Azure. With files in the ADLS store, you’re ready to start doing batch processing.
In this chapter, you’ll learn how to use Azure Data Lake Analytics (ADLA) to run analysis over data stored in semi-structured files. ADLA powers the batch processing pillar of the Lambda architecture. Figure 7.2 shows ADLA as the focus of the batch layer. ADLA uses Azure’s unbounded fast storage and readily available processing nodes to make analyzing file-based data sets as easy as analyzing relational database data sets.