8 Metadata and artifact store

 

This chapter covers

  • Understanding and managing metadata in the deep learning context
  • Designing a metadata and artifact store to manage metadata
  • Introducing two open source metadata management tools: ML Metadata and MLflow

To produce a high-quality model that fits business requirements, data scientists need to experiment with all kinds of datasets, data processing techniques, and training algorithms. To build and ship the best model, they spend a significant amount of time conducting these experiments.

A variety of artifacts (datasets and model files) and metadata are produced from model training experiments. The metadata may include model algorithms, hyperparameters, training metrics, and model versions, which are very helpful in analyzing model performance. To be useful, this data must be persistent and retrievable.

When data scientists need to investigate a model performance problem or compare different training experiments, is there anything we, as engineers, can do to facilitate these efforts? For example, can we make model reproducing and experiment comparison easier?

8.1 Introducing artifacts

8.2 Metadata in a deep learning context

8.2.1 Common metadata categories

8.2.2 Why manage metadata?

8.3 Designing a metadata and artifacts store

8.3.1 Design principles

8.3.2 A general metadata and artifact store design proposal

8.4 Open source solutions

8.4.1 ML Metadata

8.4.2 MLflow

8.4.3 MLflow vs. MLMD

Summary