chapter eight

8 Metadata and artifact store

 

This chapter covers

  • Understanding and managing metadata in the deep learning context
  • Designing a metadata and artifact store to manage metadata
  • Introducing two open source metadata management tools: MLMD and MLflow

To produce a high quality model that fits the business requirements, data scientists need to experiment with all kinds of datasets, data processing techniques and training algorithms. In order to build and ship the best model, they spend a significant amount of time conducting these experiments.

A variety of artifacts (datasets and model files) and metadata are produced as a result of model training experiments. That metadata may include model algorithms, hyperparameters, training metrics, and model versions, which are very useful to analyze model performance.  To be useful, this data must be persistent and retrievable. 

When data scientists need to investigate a model performance issue or compare different training experiments, is there anything we, as engineers, can do to facilitate these efforts? Could we, for example, make model reproducing and experiment comparison easier?

8.1  Introducing artifacts

8.2  Metadata in a deep learning context

8.2.1  Common metadata categories

8.2.2  Why manage metadata?

8.3  Designing a metadata and artifacts store

8.3.1  Design principles

8.3.2  A general metadata and artifact store design proposal

8.4  Open source solutions

8.4.1  ML Metadata

8.4.2  MLflow

8.4.3  MLflow vs ML Metadata(MLMD)

8.5  Summary