chapter eleven

11 Monitoring and explainability

This chapter covers

Setting up monitoring and logging for ML applications
Routing alerts using Alertmanager
Storing logs in Loki for scalable log aggregation and querying
Identifying data drift
Using model explainability to understand how the ML model makes its decisions

Moving models to production is only the first step—keeping them performing reliably over time requires robust monitoring and understanding of their behavior. In this chapter, we’ll explore how to implement comprehensive monitoring for ML systems and gain insights into their decision-making processes (figure 11.1).

Figure 11.1 The mental map where we’re now focusing on model monitoring (8)

A screenshot of a computer

AI-generated content may be incorrect.

We’ll tackle monitoring from two critical angles. First, we’ll set up basic operational monitoring to ensure our services meet performance and reliability requirements.

Then, we’ll implement ML-specific monitoring to detect data drift and track model behavior. Model monitoring can be split up into two main components.

Basic monitoring
Data drift monitoring

11.1 Monitoring

11 Monitoring and explainability

This chapter covers

Figure 11.1 The mental map where we’re now focusing on model monitoring (8)

11.1 Monitoring

11.1.1 Basic monitoring

11.1.2 Custom metrics

11.1.3 Logging

11.1.4 Alerting

11.2 Data drift detection

11.2.1 Object detection

11.2.2 Movie recommender

11.3 Explainability

11.3.1 Object detection

11.3.2 Movie recommendation

Summary