2 Storage

 

This chapter covers

  • Storing data in a data platform
  • Using Azure Data Explorer for ingestion and analytics
  • Using Azure Data Lake Storage for big data storage
  • Applying data ingestion patterns

Data storage is the core piece of a data platform around which everything else is built. The focus of this chapter is storage solutions and trade-offs. We’ll also introduce two Azure services that we will use and discuss how these integrate. Figure 2.1 recaps the high-level view from chapter 1, highlighting the component discussed in this chapter.

Figure 2.1 Storage is the core piece of a data platform around which everything else is built. Data gets ingested into the storage layer and is distributed from there. All workloads (data processing, analytics, and machine learning) access this layer.

Because data moves continuously in and out of the data platform, this chapter focuses on storage and the need to accommodate multiple storage solutions, both external and inside the data platform. We will sketch out the storage layer of our data platform, then stand up the corresponding Azure services.

2.1 Storing data in a data platform

2.1.1 Storing data across multiple data fabrics

2.1.2 Having a single source of truth

2.2 Introducing Azure Data Explorer

2.2.1 Deploying an Azure Data Explorer cluster

2.2.2 Using Azure Data Explorer

2.2.3 Working around query limits

2.3 Introducing Azure Data Lake Storage

2.3.1 Creating an Azure Data Lake Storage account

2.3.2 Using Azure Data Lake Storage

2.3.3 Integrating with Azure Data Explorer

2.4 Ingesting data

2.4.1 Ingestion frequency

2.4.2 Load type

sitemap