This chapter covers
- An overview of stream processing frameworks
- Partitioning and parallelization mechanisms in Kafka Streams
- Implementing SQL-like queries in stream processing
- Demonstrating use cases for Kafka Streams
We now know different ways to populate Kafka with data. We can use producers to send data directly to Kafka, which makes the most sense when we’re near the data source. For example, our machines in the factory are equipped with Kafka producers to send measurement data and events directly to Kafka, or we use Kafka to collect log data from servers or website visits. On the other hand, if we want to collect data from databases, files, or cloud services with Kafka, it’s worth taking a look at Kafka Connect.
Similarly, we’re familiar with various methods to read data from Kafka and make it available to third-party systems. We often use Kafka consumers to display data directly or trigger actions in third-party systems. Conversely, when we want to write data from Kafka to other systems, we advise our customers to consider Kafka Connect, as it’s often a more suitable approach than implementing custom consumers.
With these tools, we have numerous ways to implement highly performant and useful systems. We can exchange data between different systems in near real-time or create modern integration pipelines. Originally, Kafka was used to provide massive data from various big data systems such as Hadoop to be batch-processed later.