2 Kafka Cluster Data Architecture

 

This chapter covers

  • Organizing related messages through topics
  • Utilizing partitions for parallel processing data
  • The composition of Kafka messages: keys, values, and headers
  • Using replication to ensure availability and fault tolerance
  • Working with compacted topics for persistent data storage

The meeting room buzzed with anticipation as Max Sellington, Rob Routine, and Eva Catalyst gathered around the table, laptops open and minds focused on the task at hand – designing a proof of concept for their Customer 360 project with Apache Kafka. The idea is to pull together customer info from various sources and present it in a unified view. Today, they're diving into how to handle all this data on Kafka servers, with plans to tackle client applications later.

MAX (leaning forward): Alright team, let's dive into our proof of concept. Eva, as our data engineer, where do we begin when selecting topics for our Kafka setup?

EVA (nodding): Good question, Max. In Kafka, "topics" group related messages and serve as the destination where data is sent and stored. We need to pinpoint the key events we want to capture. Customer interactions, transactions, website visits – each could potentially become a topic.

2.1 Inside the Kafka cluster

2.2 Core concepts of data processing

2.2.1 Partitioning the topic

2.2.2 Processing data concurrently

2.2.3 Ordering within a topic

2.3 Capturing the architecture of topics, partitions, and beyond

2.3.1 Introducing AsyncAPI

2.4 Replicating partitions

2.4.1 Replica leaders and followers

2.4.2 Choosing replication factor and minimal number of in-sync replicas

2.4.3 Extending Topic Configuration with Replication Information

2.4.4 Architecture Notes: Configuring Topics

2.5 Inside the topic

2.5.1 Messages: keys, values and headers

2.5.2 First draft for documenting messages in AsyncAPI

2.5.3 Message batches and offsets

2.5.4 Physical representation of a topic

2.5.5 Data retention

2.5.6 Selecting the number of partitions

2.5.7 Configuring topic metadata

2.5.8 Architecture Points: Advanced Topic Configuration

2.6 Compacted topics

2.6.1 The idea of compaction

2.6.2 How compaction works

2.6.3 Making decisions about compaction policy

2.7 Online resources

2.7.1 Architecture Points: Compaction

2.8 Summary