11 Distributing data

 

This chapter covers

  • Sharing data through an API
  • Sharing data for bulk copy
  • Data sharing best practices

We’ve come a long way in this book: we’ve covered data ingestion and storage, all the different workloads a data platform runs, and multiple aspects of data governance. This final chapter is all about the output of our data platform or how data leaves our systems to be consumed by users or other systems. Figure 11.1 highlights this last focus area.

Figure 11.1 Data distribution covers the output of our data platform and how data leaves our system.

We’ll start by talking about data distribution in general and then some common patterns for this. In some cases, this can be easily achieved with SaaS (software as a service) solutions like Power BI for publishing reports. Other times, we might need to stand up some infrastructure to support data distribution. Two common consumption patterns are low-volume/high-frequency and high-volume/low-frequency.

We’ll talk about building a data API, how a data API can support low-volume/ high-frequency consumption, the advantages of having an API layer, and some of the trade-offs. We’ll introduce Azure Cosmos DB and show why it is a great option for a data backend. We’ll also briefly touch on serving ML models specifically, as Azure Machine Learning (AML) offers this capability.

11.1 Data distribution overview

11.2 Building a data API

11.2.1 Introducing Azure Cosmos DB

11.2.2 Populating the Cosmos DB collection

11.2.3 Retrieving data

11.2.4 Data API recap

11.3 Serving machine learning

11.4 Sharing data for bulk copy

11.4.1 Separating compute resources

11.4.2 Introducing Azure Data Share

11.4.3 Sharing data for bulk copy recap

11.5 Data sharing best practices

Summary

sitemap