3 Integrating data systems in real-time with Kafka Connect

 

This chapter covers

  • Extracting data from database systems with change data capture
  • Turning data systems at rest into streams of change events
  • Publishing change events to external data systems
  • Transforming data with Kafka Connect

In the last chapter, we looked at Apache Kafka and its ecosystem from an architectural point of view and learned how one can combine Kafka, Kafka Connect, and Kafka Streams to build powerful streaming data pipelines. We walked through a few practical examples to understand how to use the low-level consumer and producer APIs for writing and reading data with Kafka.

Kafka Connect is a connector framework on top of Kafka’s consumer and producer API. It plays an essential role in the Kafka ecosystem by acting as the entry and exit points of streaming architectures: Source connectors extract change events from external systems, like databases, and produce them to Kafka topics, while sink connectors consume events from Kafka topics and publish them to external systems, like data warehouses.

3.1 Meet our case study: Building a streaming data pipeline for an e-commerce business

 
 

3.2 Capturing changes from transactional databases with Debezium

 

3.2.1 Debezium in action

 
 

3.2.2 Format of change events

 
 
 

3.2.3 Logical decoding and replication slots in PostgreSQL

 
 
 
 

3.2.4 Streaming record-level change events

 
 
 

3.2.5 Snapshots

 
 

3.2.6 Configuring Debezium

 
 
 

3.3 Single message transforms in Kafka Connect: When to use and when to avoid

 
 
 

3.4 Streaming records to data sinks

 
 

3.5 Summary

 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest