chapter eight

8 Monitoring

This chapter covers

Observing the health of Kafka Connect connectors
The most important metrics of Kafka Connect
Monitoring Kafka Streams applications
Detecting consumer applications that are lagging behind

The last chapter covered preparing Kafka Streams applications for production and running them on Kubernetes, a container-based orchestration system. It also discussed how to improve the robustness of Kafka Connect deployments. While packaging and deploying streaming applications is an essential part of running them in production, the journey doesn’t end here. Once the application is up and running, we need to continuously monitor it to make sure it’s operating as expected.

Streaming data pipelines consist of many different components and build on top of various technologies. We use Apache Kafka to store event data, implement connectors with Kafka Connect, and build stateful stream processing with Kafka Streams. Whether it’s a Kafka Streams app, a Kafka Connect connector, or any other Kafka workload, these applications continuously process data and move them between systems around the clock. Manually overseeing streaming data pipelines is not an option so we should build automated means to monitor their behavior and health. Fortunately, the different technologies expose metrics that can be used to determine the current operational state.

8.1 Resource consumption

8.2 Kafka Connect connectors

8.2.1 Health endpoint in the Kafka Connect REST API

8.2.2 Important JMX metrics

8 Monitoring

This chapter covers

8.1 Resource consumption

8.2 Kafka Connect connectors

8.2.1 Health endpoint in the Kafka Connect REST API

8.2.2 Important JMX metrics

8.3 Kafka Streams applications

8.3.1 Client metrics

8.3.2 Thread metrics

8.3.3 Task metrics

8.3.4 Processor node metrics

8.4 Keeping track of consumer lags

8.4.1 Calculating consumer lags

8.4.2 Interpreting consumer lags

8.5 Summary