In previous chapters you’ve seen examples of prep work for batch processing, loading files into storage, and saving groups of messages into files. Storage accounts, Data Lake, and Event Hubs set the base for building a batch processing analytics system in Azure. In this chapter, you’re going to see how these services support stream processing too.
Stream processing covers running an operation on individual pieces of data from an endless sequence, or on multiple pieces of data in a time-ordered sequence. These two approaches are called one-at-a-time or real-time stream processing and micro-batch processing.
Figure 6.1 shows two queries processing a stream of data. One query checks every new data item and returns an output for each match. The other query counts how many items were submitted during a repeating time frame. The data is organized by time. Data in files from Azure Storage and messages from ingestion services like Event Hubs can both feed into stream processors. Stream processors generate results in real time rather than on demand. The query is registered once, and results are output repeatedly.1