11 Distributing data
In this chapter:
- Sharing data through an API
- Sharing data for bulk copy
- Data sharing best practices
We’ve come a long way: we covered data ingestion and storage, we covered all the different workloads a data platform runs, and multiple aspects of data governance. This final chapter is all about the output of our data platform – how data leaves our systems to be consumed by users or other systems. Figure 11.1 highlights this last focus area.
Figure 11.1 Data distribution is concerned with the output of our data platform and how data leaves our system.
We’ll start by talking about data distribution in general and some common patterns for this. In some cases, this can be easily achieved with SaaS solutions like Power BI for publishing reports. Other times, we might need to stand up some infrastructure to support data distribution. Two common consumption patterns are low-volume/high-frequency and high-volume/low-frequency.
We’ll talk about building a data API, how a data API can support low-volume/high-frequency consumption, the advantages of having an API layer and some of the tradeoffs. We’ll introduce Azure Cosmos DB and show why it is a great option for a data backend. We’ll also briefly touch on serving ML models specifically, since this is a capability offered by Azure Machine Learning.