chapter two

2 Kafka cluster data architecture

 

This chapter covers

  • Organizing related messages through topics
  • Utilizing partitions for parallel processing data
  • The composition of Kafka messages: keys, values, and headers
  • Using replication to ensure availability and fault tolerance
  • Working with compacted topics for persistent data storage

Let’s step away from the business patterns for applying Kafka and explore the implementation of design ideas from an architectural perspective. We first need to understand Kafka’s core abstractions—topics and partitions. Then we can move toward how Kafka provides durability through replication and how data is ultimately stored on disk.

We use topics and partitions to process data in parallel, preserve ordering where it matters, and replicate partitions. Think of topics as destinations where events are sent. They contain individual records—keys, values, and optional headers—and require many configuration decisions, such as batching, offsets, on-disk layout, retention policies, and the number of partitions.

2.1 Inside the Kafka cluster

2.2 Core concepts of data processing

2.2.1 Partitioning the topic

2.2.2 Processing data concurrently

2.2.3 Ordering within a topic

2.2.4 AsyncAPI: Capturing the architecture of topics, partitions, and more

2.3 Replicating partitions

2.3.1 Replica leaders and followers

2.3.2 Choosing replication factor and minimal number of in-sync replicas

2.3.3 Extending topic configuration with replication information

2.4 Inside the topic

2.4.1 Messages: Keys, values and headers

2.4.2 First draft for documenting messages in AsyncAPI

2.4.3 Message batches and offsets

2.4.4 Physical representation of a topic

2.4.5 Data retention

2.4.6 Selecting the number of partitions

2.4.7 Configuring topic metadata

2.5 Compacted topics