chapter eight

8 Designing streaming applications

 

This chapter covers

  • Real-time processing
  • Designing streaming applications
  • The architecture of the Kafka Streams framework
  • Exploring ksqlDB and Apache Flink for real-time data processing

To implement real-time processing use cases, you need a clear grasp of the under­lying building blocks for streaming on Kafka so you can create a plan for transforming and aggregating the system data. In particular, you need to know when, where, and how to transform, join, and aggregate events as they arrive. The big decision to make at this point is whether to use a dedicated stream-processing framework or traditional service code. If you decide to use Kafka Streams, you’ll need to understand its processing model, including stateless operators like map and filter as well as stateful capabilities such as joins, windowing, and aggregations. The Processor API is useful when you want to implement custom logic, such as if you want to invoke an external service to detect anomalies in events.

There are, of course, alternatives to Kafka Streams, and we’ll also look at when it might make more sense to use ksqlDB, Apache Flink, or managed cloud services.

8.1 Introducing Kafka Streams

8.1.1 ETL, ELT, and stream processing

8.1.2 The Kafka Streams framework

8.1.3 Benefits of using Kafka Streams

8.2 Sketching out the ODS with Kafka Streams

8.3 Processing data

8.3.1 Stateless operations

8.3.2 Stateful operations

8.3.3 The Processor API

8.3.4 Kafka Streams internal architecture

8.3.5 Windowing operations

8.3.6 Joining streams

8.3.7 Implementing CustomerJoinService in the example ODS

8.3.8 Interactive queries

8.4 Alternative solutions

8.4.1 Confluent ksqlDB

8.4.2 Apache Flink

8.4.3 Solutions from cloud providers

8.5 Common streaming application challenges