As we move deeper into ML engineering, we now tackle a critical challenge: how to reliably track, reproduce, and deploy ML experiments. This chapter introduces essential tools that turn ad hoc experimentation into production-ready ML workflows. We’ll build a practical ML platform that improves reliability while remaining flexible enough for real-world applications.
In particular, we’ll explore individual components of the ML platform discussed in chapter 1, section 1.3. We’ll discuss different tooling/applications that help us in tracking our data science experiments, storing the model features, and aiding in pipeline orchestration and model deployment. Our goal is to show a fully functional mini ML platform with these tools while highlighting interactions between them.
We’ll start our ML journey the way most data scientists do—by understanding the data. We’ll perform some exploratory data analysis (EDA), split our dataset into training and testing sets, and run multiple models to get the one that performs best. The initial stages of a data science project are mostly exploratory, so we’ll experiment with different features, model hyperparameters, and frameworks.