This chapter covers:
- Setting up a Data Lake store
- Configuring file access in Data Lake storage
- Understanding and planning for data drift
In the last chapter, you learned how to work with a fundamental service in Azure, the Storage account. Storage accounts provide nearly unlimited storage for many Azure services, with high throughput and high redundancy. A Storage account also hosts for other file-based services, such as file shares and queues.
In this chapter, you’ll learn about another storage service, the Azure Data Lake store. You’ll create a Data Lake store and learn how to structure your data lake to increase maintainability and security around your data. You’ll learn how this service supports other Azure services through Azure Active Directory authentication. The storage system will be the central service around which you construct the analytics system.
Azure Data Lake store (ADL) resembles a local file system, with folders and files. Azure Active Directory (AAD) controls access to folders and files, with assignable read/write/execute permissions. ADL provides the primary storage backbone for the master data set, a source of data for batch layer processing. ADL also stores batch analysis artifacts, including the report files that make up the output of the Serving Layer (Figure 1.2).