16 Production infrastructure

This chapter covers

Using and implementing a model registry in passive retraining paradigms to ensure ML solutions adapt to drift.
Using a feature store to eliminate rework and share calculated insights throughout an organization.
Selecting appropriate serving layer architecture to minimize both complexity and cost for a particular use case.

Utilizing ML in a real-world use case to solve a complex problem is challenging. The sheer number of skills needed to take a company’s data (frequently messy, partially complete, and rife with quality issues), select an appropriate algorithm, tune a pipeline, and validate that the prediction output of a model (or an ensemble of models) solves the problem to the satisfaction of the business is quite daunting. The complexity of an ML-backed project does not end with the creation of an acceptably performing model, though. The architectural considerations, choices to be made on such, and the implementation details of which can add significant challenges to a project if they aren’t made correctly.

Every day there seems to be a new open-sourced tech stack that promises an easier deployment strategy or a magical automated solution that meets the needs of all. With this deluge of tools and platforms constantly releasing, it is daunting to know where to go to meet the needs of a particular project.

16.1 Artifact Management

16.1.1 MLflow’s model registry

16.1.2 Interfacing with the Model Registry

16.2 Feature Stores

16.2.1 What a feature store is used for

16.2.2 Using a feature store

16.2.3 Evaluating a feature store

16.3 Prediction Serving Architecture

16.3.1 Determining serving needs

16.3.2 Internal use cases

16.3.3 Bulk external delivery

16.3.4 Micro-batch streaming

16.3.5 Real-time server-side

16.3.6 Integrated models (edge deployment)

16.4 Summary