Building on the foundational infrastructure from part 1, we now tackle a critical challenge in ML engineering: how to reliably track, reproduce, and deploy ML experiments and models. Ad hoc experimentation and manual processes hinder reproducibility and scalability, necessitating dedicated tools and practices for managing the ML workflow effectively. Successfully productionizing ML requires more than just infrastructure; it demands specialized components for managing the unique aspects of the ML life cycle.
This part focuses on constructing the core components of a practical ML platform, transforming ad hoc processes into production-ready systems. You’ll learn how to use MLflow for robust experiment tracking and model management; implement a feature store with Feast to ensure feature consistency and address training-serving skew; orchestrate automated, multi-step workflows using Kubeflow Pipelines; and deploy models as scalable services while monitoring for critical issues (e.g., data drift) using tools such as BentoML and Evidently.