8 Metadata

 

In this chapter:

  • Managing metadata for understanding data
  • Introducing Azure Purview
  • Maintaining a data dictionary and a data glossary
  • Understanding advanced features of Azure Purview

This chapter is all about metadata, in other words data about the data. This is one aspect of data governance. We will cover other two important aspects in the following chapters data quality (in chapter 9) and compliance (in chapter 10). Figure 8.1 highlights our current area of focus. We won’t show this map of our data platform again until the last chapter, which covers data distribution.

Figure 8.1 Data governance deals with multiple aspects of managing data, including metadata, data quality, access control, compliance to laws and standards etc.

We’ll start by outlining the information architecture challenges a big data platform encounters and how metadata can help address these. We’ll introduce two important concepts: data dictionaries and data glossaries. Using these, we can inventory our datasets and queries.

Next, we’ll look at Azure Purview. Azure Purview is the Azure data governance service which can help us manage our metadata. We’ll spin up a new instance of Azure Purview and go over some of its key features. At the time of writing, the service was recently launched, and there is no Azure CLI support for it. Unlike other chapters, where we were able to automate via Azure CLI, this time around we will be looking at more UI.

8.1      Making sense of the data

8.2      Introducing Azure Purview

8.3      Maintaining a data inventory

8.3.1   Setting up a scan

8.3.2   Browsing the data dictionary

8.3.3   Data dictionary recap

8.4      Managing a data glossary

8.4.1   Adding a new glossary term

8.4.2   Curating terms

8.4.3   Custom templates and bulk import

8.4.4   Data glossary recap

8.5      Understanding Azure Purview advanced features

8.5.1   Tracking lineage

8.5.2   Classification rules

8.5.3   REST API

8.5.4   Advanced features recap

8.6      Summary

sitemap