2 Storage

 

In this chapter:

  • Storing data in a data platform.
  • Using Azure Data Explorer for ingestion and analytics.
  • Using Azure Data Lake Storage for big data storage.
  • Applying data ingestion patterns.

Data storage is the core piece of a data platform, around which everything else is built. Figure 2.1 recaps the high-level view from chapter 1, highlighting the component discussed in this chapter.

Figure 2.1 Storage is the core piece of a data platform around which everything else is built. Data gest ingested into the storage layer and is distributed from there. All workloads (data processing, analytics, machine learning) access this layer.

We will talk about storage solutions and tradeoffs, introduce two Azure services which we will use, and how they integrate.

Data moves continuously in and out of the data platform. In this chapter we will focus on storage, and the need to accommodate multiple storage solutions, both external and inside the data platform. We will sketch out the storage layer of our data platform, then stand up the corresponding Azure services.

We will deploy an Azure Data Explorer cluster, Microsoft’s big data analytics platform. We will create a table, ingest some data into it, then we’ll look at a few basics KQL queries, the Kusto Query Language used by Azure Data Explorer. Kusto was the codename of Azure Data Explorer before it launched as a public service. You sometimes might encounter “Kusto” instead of “Azure Data Explorer” but know they are the same service.

2.1      Storing data in a data platform

2.1.1   Storing data across multiple data fabrics

2.1.2   Having a single source of truth

2.2      Introducing Azure Data Explorer

2.2.1   Deploying an Azure Data Explorer cluster

2.2.2   Using Azure Data Explorer

2.2.3   Working around query limits

2.3      Introducing Azure Data Lake Storage

2.3.1   Creating an Azure Data Lake Storage account

2.3.2   Using Azure Data Lake Storage

2.3.3   Integrating with Azure Data Explorer

2.4      Ingesting data

2.4.1   Ingestion frequency

2.4.2   Load type

2.4.3   Restatements and reloads

2.5      Summary

sitemap