This chapter covers
- Mechanisms behind message cleanup in Kafka
- Options for managing message retention
- How Kafka handles cleanup of outdated data
In Kafka, managing the life cycle of messages is crucial for maintaining system performance and ensuring data integrity. This chapter delves into two key approaches: log retention and log compaction. Log retention focuses on deleting messages based on age or size, offering simplicity in implementation and catering to various use cases such as compliance requirements and data management. On the other hand, log compaction selectively removes outdated data based on keys, ensuring that only the latest message for each key is retained. By understanding the principles and configurations of log retention and log compaction, Kafka users can effectively manage message retention policies tailored to their specific needs, optimizing storage usage and ensuring data accuracy throughout the system.
10.1 Why clean up messages?
Before we get into the details of how Kafka cleans up messages, let’s briefly consider why we should clean up messages in Kafka and what consequences will arise if we never clean up messages. One reason is our storage capacity. Theoretically, we could simply store all messages forever, but this would also cause our log to grow infinitely and quickly reach the limits of our available storage.