This chapter covers
- Introducing the Kafka Streams API
- Building our first Kafka Streams application
- Working with customer data; creating more complex applications
- Splitting, merging and branching streams oh my!
Simply stated, a Kafka Streams application is a graph of processing nodes that transforms event data as it streams through each node. Let’s take a look at an illustration of what this means:
Figure 6.1. Kafka Streams is a graph with a source node, any number of processing nodes and a sink node
This illustration represents the generic structure of most Kafka Streams applications. There is a source node that consumes event records from a Kafka broker. Then there are any number of processing nodes, each performing a distinct task and finally a sink node used to write the transformed records back out to Kafka. In a previous chapter we discussed how to use the Kafka clients for producing and consuming records with Kafka. Much of what you learned in that chapter applies for Kafka Streams, because at it’s heart, Kafka Streams is an abstraction over the producers and consumers, leaving you free to focus on your stream processing requirements.
Important
While Kafka Streams is the native stream processing library for Apache Kafka ®, it does not run inside the cluster or brokers, but connects as a client application.
In this chapter, you’ll learn how to build such a graph that makes up a stream processing application with Kafka Streams.