8 Designing Streaming Applications

 

This chapter covers

  • An introduction to real-time processing and its key principles
  • The design for building streaming applications
  • The architecture of the Kafka Streams framework
  • Exploring ksqlDB and Apache Flink for real-time data processing

To implement real-time processing use cases, you need a clear grasp of the underlying concepts and frameworks. Here we’ll cover the building blocks for streaming on Kafka—how to transform, join, and aggregate events as they arrive. You’ll weigh dedicated stream-processing frameworks against traditional service code. Using Kafka Streams, we explain core concepts and operators—stateless (map, filter) and stateful (joins, windows, aggregates)—and when to use the Processor API. Finally, we compare alternatives—ksqlDB, Apache Flink, and managed cloud services—with guidance on when to choose them over Kafka Streams.

8.1 Field notes: Transforming data in motion

The team gathered for their regular meeting, notebooks and laptops at the ready. Today’s topic quickly turned to the question of how to transform and aggregate their data as part of the Customer360 project.

Eva: Since we’re doing all this research, I think it’s worth considering transforming the data not just at the final service but somewhere in the middle. We could use a streaming framework for that.

Max: A streaming framework? What do you mean by that, Eva?

8.2 Introducing Kafka Streams

8.2.1 ETL, ELT and Stream Processing

8.2.2 Introduction to the Kafka Streams framework

8.2.3 Benefits of using Kafka Streams

8.3 Field notes: Sketching out Customer360 with Kafka Streams

8.4 Processing data

8.4.1 Stateless Operations

8.4.2 Stateful operations

8.4.3 Processing API

8.4.4 Kafka Streams internal architecture

8.4.5 Windowing operations

8.4.6 Joining streams

8.4.7 Field notes: Implementing CustomerJoinService

8.4.8 Interactive Queries

8.5 Alternative Solutions

8.5.1 Confluent ksqlDB

8.5.2 Apache Flink

8.5.3 Solutions from Cloud Providers

8.6 Common streaming application issues

8.6.1 Memory and disk capacity planning

8.6.2 Incorrect topic partitioning

8.6.3 Out-of-order data

8.6.4 Late arriving data

8.6.5 State Store initialization

8.6.6 Monitoring and debugging challenges

8.7 Online resources

8.8 Summary