9 Performance
This chapter covers
- The relationship between throughput and latency
- Scaling connectors up and down
- Leveraging batching to increase throughput
- Reducing the amount of transferred data with compression
- Autoscaling Kafka Streams applications
We looked into monitoring streaming data pipelines in the last chapter. Capturing the right metrics lays the foundation for understanding the performance of your application and is essential to applying data streaming in your projects successfully. Collecting and viewing metrics about streaming data pipelines is only half the story, though. We also need to respond to them.
What if, for instance, the metrics reveal that your Elasticsearch connector can’t cope with the current load, publishing records to Elasticsearch at a lower throughput than the upstream PostgreSQL source connector of your streaming data pipeline produces them? Applications building on top of Elasticsearch, like a critical business dashboard, would not be able to work with current data, effectively reducing the value of the streaming data pipeline.
Similarly, what if your Kafka Streams app, which performs fraud detection on financial transaction data, must call external APIs while streaming data from sources to sinks and these API calls become slower over time, increasing the end-to-end latency of your streaming data pipeline? Processing the transactions would take longer and longer, negatively impacting the experience of your customers.