chapter two
2 Kafka cluster data architecture
This chapter covers
- Organizing related messages through topics
- Utilizing partitions for parallel processing data
- The composition of Kafka messages: keys, values, and headers
- Using replication to ensure availability and fault tolerance
- Working with compacted topics for persistent data storage
Let’s step away from the business patterns for applying Kafka and explore the implementation of design ideas from an architectural perspective. We first need to understand Kafka’s core abstractions—topics and partitions. Then we can move toward how Kafka provides durability through replication and how data is ultimately stored on disk.
We use topics and partitions to process data in parallel, preserve ordering where it matters, and replicate partitions. Think of topics as destinations where events are sent. They contain individual records—keys, values, and optional headers—and require many configuration decisions, such as batching, offsets, on-disk layout, retention policies, and the number of partitions.