4 Azure Data Lake storage

 

This chapter covers:

  • Setting up a Data Lake store
  • Configuring file access in Data Lake storage
  • Understanding and planning for data drift

In the last chapter, you learned how to work with a fundamental service in Azure, the Storage account. Storage accounts provide nearly unlimited storage for many Azure services, with high throughput and high redundancy. A Storage account also hosts for other file-based services, such as file shares and queues.

In this chapter, you’ll learn about another storage service, the Azure Data Lake store. You’ll create a Data Lake store and learn how to structure your data lake to increase maintainability and security around your data. You’ll learn how this service supports other Azure services through Azure Active Directory authentication. The storage system will be the central service around which you construct the analytics system.

Azure Data Lake store (ADL) resembles a local file system, with folders and files. Azure Active Directory (AAD) controls access to folders and files, with assignable read/write/execute permissions. ADL provides the primary storage backbone for the master data set, a source of data for batch layer processing. ADL also stores batch analysis artifacts, including the report files that make up the output of the Serving Layer (Figure 1.2).

Figure 4.1. Lambda architecture with Azure PaaS services
Lambda Architecture

4.1  Create an Azure Data Lake store

4.1.1  Using Azure Portal

4.1.2  Using Azure PowerShell

4.2  Data Lake store access

4.2.1  Access schemes

4.2.2  Configuring ADL access

4.2.3  Hierarchy structure in Data Lake store

4.3  Storage folder structure and data drift

4.3.1  Hierarchy structure revisited

4.3.3  Data drift

4.4  Copy tools for Data Lake store

4.4.1  Data explorer

4.4.2  ADLCopy tool

4.4.3  Azure Storage Explorer tool

4.5  Exercises

4.5.1  Exercise 1

4.5.2  Exercise 2

4.6  Summary

sitemap