Metadata Layer Architecture

 

This chapter covers:

  • A definition of data platform metadata and how it differs from business metadata
  • How to architect the optimal metadata layer for the size and complexity of your system and organization
  • Designing a metadata model with multiple domains - Pipeline Configuration, Data Quality Checks and Pipeline activity
  • Metadata layer implementation options
  • Existing commercial and open source options for metadata layer implementation

By the end of this chapter you’ll be able to:

  • Architect an appropriate metadata layer for the size and complexity of your system and organization
  • Leverage metadata to simplify the management of your data platform
  • Evaluate which of the commercial and open source options might be worth exploring for your use case

In this chapter, we’ll help you get a clear understanding of what we mean by data platform internal metadata and why it is important to the operation of a data platform

We’ll cover the difference between configuration and activity metadata and how each can be used - using examples of a data platform with growing complexity. We will show why the metadata layer should become the primary interface for data engineers and advanced data users.

7.1 What we mean by metadata

7.1.1 Business metadata

7.1.2 Data Platform internal metadata or “pipeline metadata”

7.2 Taking advantage of pipeline metadata

7.3 Metadata model

7.3.1 Metadata domains

7.4 Metadata layer implementation options

7.4.1 Metadata layer as a collection of configuration files

7.4.2 Metadata database

7.4.3 Metadata API

7.5 Overview of existing solutions

7.5.1 Cloud metadata services

7.5.2 Open source metadata layer implementations

7.6 Summary

7.7 Exercise Answers

sitemap