2 Kafka Cluster Data Architecture
This chapter covers
- Organizing related messages through topics
- Utilizing partitions for parallel processing data
- The composition of Kafka messages: keys, values, and headers
- Using replication to ensure availability and fault tolerance
- Working with compacted topics for persistent data storage
Let’s start with the building blocks of Kafka from the cluster’s point of view: topics, partitions, replication, and how data is physically stored. We begin with topics and partitions—how to process data in parallel, preserve ordering where it matters, and replicate partitions. Then we look inside a topic: message structure (keys, values, headers), batches and offsets, the on-disk layout, retention policies, and how to select the number of partitions. Finally, we explain compacted topics—the rationale, mechanics, and when compaction runs. These fundamentals are essential to grasp Kafka’s architecture.
2.1 Field notes: From sketch to project—topics, partitions, and keys
The meeting room buzzed with anticipation as Max Sellington, Rob Routine, and Eva Catalyst gathered around the table, laptops open and minds focused on the task at hand – designing a proof of concept for their Customer 360 project with Apache Kafka. The idea is to pull together customer info from various sources and present it in a unified view. Today, they're diving into how to handle all this data on Kafka servers, with plans to tackle client applications later.