3 Integrating data systems in real-time with Kafka Connect
This chapter covers
- Extracting data from database systems with change data capture
- Turning data systems at rest into streams of change events
- Publishing change events to external data systems
- Transforming data with Kafka Connect
In the last chapter, we looked at Apache Kafka and its ecosystem from an architectural point of view and learned how one can combine Kafka, Kafka Connect, and Kafka Streams to build powerful streaming data pipelines. We walked through a few practical examples to understand how to use the low-level consumer and producer APIs for writing and reading data with Kafka.
Kafka Connect is a connector framework on top of Kafka’s consumer and producer API. It plays an essential role in the Kafka ecosystem by acting as the entry and exit points of streaming architectures: Source connectors extract change events from external systems, like databases, and produce them to Kafka topics, while sink connectors consume events from Kafka topics and publish them to external systems, like data warehouses.