7 Metadata layer architecture

 

This chapter covers

  • Understanding data platform technical metadata vs. business metadata
  • Leveraging metadata to simplify data platform management
  • Architecting the optimal metadata layer
  • Designing a metadata model with multiple domains
  • Understanding metadata layer implementation options
  • Evaluating commercial and open source metadata options

In this chapter, we’ll help you get a clear understanding of what we mean by data platform internal metadata and why it is important to the operation of a data platform.

We’ll cover the difference between configuration and activity metadata and how each can be used, using examples of a data platform with growing complexity. We will show why the metadata layer should become the primary interface for data engineers and advanced data users.

We will describe a generic metadata model with four main domains—pipeline metadata, data quality checks, pipeline activity, and schema registry—a model that we have found to work across different organizations, focusing on the aspects of metadata that we found to be more or less universal.

7.1 What we mean by metadata

7.1.1 Business metadata

7.1.2 Data platform internal metadata or “pipeline metadata”

7.2 Taking advantage of pipeline metadata

7.3 Metadata model

7.3.1 Metadata domains

7.4 Metadata layer implementation options

7.4.1 Metadata layer as a collection of configuration files

7.4.2 Metadata database

7.4.3 Metadata API

7.5 Overview of existing solutions

7.5.1 Cloud metadata services

7.5.2 Open source metadata layer implementations

7.6  Exercise answers