9 Model Training and Validation: Part 2
This chapter covers
- Storing and retrieving datasets with VolumOps
- Using MLFLow and Tensorboard to track and visualize training
- Importance of lineage and experiment tracking
We left off in the last chapter after creating a simple pipeline to train the image detection model and talked a bit about storing and retrieving datasets. We also tried out the model locally to get a feel of how it works and to sort out obvious flaws. In this chapter, we take this a step further and implement steps to improve the robustness of the training pipeline and more importantly to bring visibility into the training process. As we build more models and the number of stakeholders in the model lifecycle increase, it becomes more important to have traceability and asynchronous observability in the training process. We also dive into Tensorboard that enables visibility into the model training part and then switch focus to MLFLow that enables lineage and model versioning. While not strictly necessary to train a model, this chapter dives into concepts that help us deliver models to production more comfortably and repeatedly with deterministic results.
Let's first start off with diving into a different way of accessing data within a pipeline, the VolumeOp.