2 Kafka cluster data architecture
This chapter covers
- Organizing related messages through topics
- Utilizing partitions for parallel processing data
- The composition of Kafka messages: keys, values, and headers
- Using replication to ensure availability and fault tolerance
- Working with compacted topics for persistent data storage
Let’s start with the building blocks of Kafka from the cluster’s point of view: topics, partitions, replication, and how data is physically stored. We’ll begin with topics and partitions—how to process data in parallel, preserve ordering where it matters, and replicate partitions. Then we’ll look inside a topic: message structure (keys, values, headers), batches and offsets, the on-disk layout, retention policies, and how to select the number of partitions. Finally, we’ll look at compacted topics—the rationale, mechanics, and when compaction runs. These fundamentals are essential for grasping Kafka’s architecture.
2.1 Inside the Kafka cluster
In this chapter, we’ll take a step away from the business patterns related to applying Kafka and explore the implementation of design ideas from an architectural perspective. We’ll explore the fundamental building blocks of Kafka:
- Topics—Destinations events are dispatched to
- Partitions—Scalability and redundancy units
- Messages—Carriers of event information
Additionally, we will discuss two types of topics:
- Streaming topics—For event streaming
- State storage topics—For storing state