We’ve come a long way in this book: we’ve covered data ingestion and storage, all the different workloads a data platform runs, and multiple aspects of data governance. This final chapter is all about the output of our data platform or how data leaves our systems to be consumed by users or other systems. Figure 11.1 highlights this last focus area.
Figure 11.1 Data distribution covers the output of our data platform and how data leaves our system.
We’ll start by talking about data distribution in general and then some common patterns for this. In some cases, this can be easily achieved with SaaS (software as a service) solutions like Power BI for publishing reports. Other times, we might need to stand up some infrastructure to support data distribution. Two common consumption patterns are low-volume/high-frequency and high-volume/low-frequency.
We’ll talk about building a data API, how a data API can support low-volume/ high-frequency consumption, the advantages of having an API layer, and some of the trade-offs. We’ll introduce Azure Cosmos DB and show why it is a great option for a data backend. We’ll also briefly touch on serving ML models specifically, as Azure Machine Learning (AML) offers this capability.