Appendix C. Understanding Kafka Streams architecture

In this book, you’ve learned that Kafka Streams is a directed, acyclic graph of processing nodes called a topology. You’ve seen how to add processing nodes to a topology for processing events in a Kafka Topic. But we still need to discuss how Kafka Streams get events into a topology, how the processing occurs, and how processed events are written back to a Kafka topic. We’ll take a deeper look into these questions in this appendix.

Here’s an illustration showing how this high-level view of what we’re going to discuss:

Figure C.1 Componetized view of a Kafka Streams application there are three sections: consuming, processing, and producing

As you can see from the illustration, at a high level, we can break up how a Kafka Streams application works into three categories:

Consuming events from a Kafka topic
Assigning, distributing and processing events
Producing processed events results to a Kafka topic

Given that we’ve already covered the Kafka clients in a previous chapter and that Kafka Streams is an abstraction over them, we won’t get into those details here. Instead, I’ll combine points one and three into a more general discussion on clients and then go deeper into Kafka Streams architecture for point two.

Appendix C. Understanding Kafka Streams architecture

Figure C.1 Componetized view of a Kafka Streams application there are three sections: consuming, processing, and producing

C.1 Consumer and producer clients in Kafka Streams

C.2 Assigning, distributing and processing events

C.3 Threads in Kafka Streams - the Stream Thread

C.4 Processing records