chapter two

2 A walk through Kafka and its ecosystem

This chapter covers

Storing events at scale with Apache Kafka
Setting up streaming connectors with Kafka Connect
Processing events in real-time with Kafka Streams
Getting up and running with Apache Kafka on your local machine

The first chapter presented the building blocks of streaming data pipelines: Event storage, connectors, and stream processors. This chapter dives deeper into the technologies that can be used for implementing them. We strongly focus on storing events with Apache Kafka, implementing connectors with Kafka Connect, and processing events with Kafka Streams. For completeness, we cover alternative technologies as well.

We refrain from diving into all details of Apache Kafka, Kafka Connect, and Kafka Streams but focus on the aspects that are essential for successfully using these technologies in the context of streaming data pipelines. Throughout the book, we develop a practical case study. This chapter helps you set up the used technologies on your local computer using Docker containers.

2.1 Storing events with Apache Kafka

2.1.1 Brokers

2.1.2 Records

2.1.3 Topics

2.1.4 Producers

2.1.5 Consumers

2.1.6 Deployment options

2.1.7 Other technologies

2.2 Integrating streaming data pipelines with Kafka Connect

2.2.1 Tasks and workers

2.2.2 Configuring connectors

2.2.3 Single-message transforms

2.2.4 Deployment options

2.3 Processing events with Kafka Streams

2.3.1 Deployment options

2.3.2 Other technologies

2.4 Summary