chapter ten

10 Cleaning up messages

 

This chapter covers

  • The mechanisms behind message cleanup in Kafka
  • Options for managing message retention
  • How Kafka handles cleanup of outdated data

In Kafka, managing the lifecycle of messages is crucial for maintaining system performance and ensuring data integrity. This chapter delves into two key approaches: log retention and log compaction. Log retention focuses on deleting messages based on age or size, offering simplicity in implementation and catering to various use cases such as compliance requirements and data management. On the other hand, log compaction selectively removes outdated data based on keys, ensuring that only the latest message for each key is retained. By understanding the principles and configurations of log retention and log compaction, Kafka users can effectively manage message retention policies tailored to their specific needs, optimizing storage usage and ensuring data accuracy throughout the system.

10.1 Why do we need to clean up messages?

10.2 Kafka's clean-up methods

10.3 Log retention

10.3.1 When is a log cleaned up via retention?

10.3.2 Offset retention

10.4 Log compaction

10.4.1 When is a log cleaned up via compaction?

10.4.2 How does the log cleaner work?

10.4.3 Tombstones

10.5 Summary