chapter two

2 Kafka Cluster Data Architecture

This chapter covers

Organizing related messages through topics
Utilizing partitions for parallel processing data
The composition of Kafka messages: keys, values, and headers
Using replication to ensure availability and fault tolerance
Working with compacted topics for persistent data storage

Let’s start with the building blocks of Kafka from the cluster’s point of view: topics, partitions, replication, and how data is physically stored. We begin with topics and partitions—how to process data in parallel, preserve ordering where it matters, and replicate partitions. Then we look inside a topic: message structure (keys, values, headers), batches and offsets, the on-disk layout, retention policies, and how to select the number of partitions. Finally, we explain compacted topics—the rationale, mechanics, and when compaction runs. These fundamentals are essential to grasp Kafka’s architecture.

2.1 Field notes: From sketch to project—topics, partitions, and keys

The meeting room buzzed with anticipation as Max Sellington, Rob Routine, and Eva Catalyst gathered around the table, laptops open and minds focused on the task at hand – designing a proof of concept for their Customer 360 project with Apache Kafka. The idea is to pull together customer info from various sources and present it in a unified view. Today, they're diving into how to handle all this data on Kafka servers, with plans to tackle client applications later.

2.2 Inside the Kafka cluster

2.3 Core concepts of data processing

2.3.1 Partitioning the topic

2.3.2 Processing data concurrently

2.3.3 Ordering within a topic

2.4 Field notes: Capturing the architecture of topics, partitions, and beyond

2.4.1 Introducing AsyncAPI

2.5 Replicating partitions

2.5.1 Replica leaders and followers

2.5.2 Choosing replication factor and minimal number of in-sync replicas

2.5.3 Field notes: Extending topic configuration with replication information

2.5.4 Architecture Notes: Configuring Topics

2.6 Inside the topic

2.6.1 Messages: keys, values and headers

2.6.2 Field notes: First draft for documenting messages in AsyncAPI

2.6.3 Message batches and offsets

2.6.4 Physical representation of a topic

2.6.5 Data retention

2.6.6 Selecting the number of partitions

2.6.7 Field notes: Configuring topic metadata

2.6.8 Architecture Points: Advanced Topic Configuration

2.7 Compacted topics

2.7.1 The idea of compaction

2.7.2 How compaction works

2.7.3 Making decisions about compaction policy

2.7.4 Architecture Points: Compaction

2.8 Online resources

2.9 Summary