3 Integrating data systems in real-time with Kafka Connect

This chapter covers

Extracting data from database systems with change data capture
Turning data systems at rest into streams of change events
Publishing change events to external data systems
Transforming data with Kafka Connect

In the last chapter, we looked at Apache Kafka and its ecosystem from an architectural point of view and learned how one can combine Kafka, Kafka Connect, and Kafka Streams to build powerful streaming data pipelines. We walked through a few practical examples to understand how to use the low-level consumer and producer APIs for writing and reading data with Kafka.

Kafka Connect is a connector framework on top of Kafka’s consumer and producer API. It plays an essential role in the Kafka ecosystem by acting as the entry and exit points of streaming architectures: Source connectors extract change events from external systems, like databases, and produce them to Kafka topics, while sink connectors consume events from Kafka topics and publish them to external systems, like data warehouses.

3.1 Meet our case study: Building a streaming data pipeline for an e-commerce business

3.2 Capturing changes from transactional databases with Debezium

3.2.1 Debezium in action

3.2.2 Format of change events

3.2.3 Logical decoding and replication slots in PostgreSQL

3.2.4 Streaming record-level change events

3.2.5 Snapshots

3.2.6 Configuring Debezium

3.3 Single message transforms in Kafka Connect: When to use and when to avoid

3.4 Streaming records to data sinks

3.5 Summary