11 Distributing data

In this chapter:

Sharing data through an API
Sharing data for bulk copy
Data sharing best practices

We’ve come a long way: we covered data ingestion and storage, we covered all the different workloads a data platform runs, and multiple aspects of data governance. This final chapter is all about the output of our data platform – how data leaves our systems to be consumed by users or other systems. Figure 11.1 highlights this last focus area.

Figure 11.1 Data distribution is concerned with the output of our data platform and how data leaves our system.

We’ll start by talking about data distribution in general and some common patterns for this. In some cases, this can be easily achieved with SaaS solutions like Power BI for publishing reports. Other times, we might need to stand up some infrastructure to support data distribution. Two common consumption patterns are low-volume/high-frequency and high-volume/low-frequency.

We’ll talk about building a data API, how a data API can support low-volume/high-frequency consumption, the advantages of having an API layer and some of the tradeoffs. We’ll introduce Azure Cosmos DB and show why it is a great option for a data backend. We’ll also briefly touch on serving ML models specifically, since this is a capability offered by Azure Machine Learning.

11.1 Data distribution overview

11.2 Building a data API

11.2.1 Introducing Azure Cosmos DB

11 Distributing data

In this chapter:

Figure 11.1 Data distribution is concerned with the output of our data platform and how data leaves our system.

11.1 Data distribution overview

11.2 Building a data API

11.2.1 Introducing Azure Cosmos DB

11.2.2 Populating the Cosmos DB collection

11.2.3 Retrieving data

11.2.4 Data API recap

11.2.5 Serving ML

11.3 Sharing data for bulk copy

11.3.1 Separating compute resources

Follower databases in Azure Data Explorer

11.3.2 Introducing Azure Data Share

11.3.3 Sharing data for bulk copy recap

11.4 Data sharing best practices

11.5 Summary