This chapter covers
- Receiving messages from Kafka
- Principles of parallel message reception
- Common challenges in Kafka consumer handling
- Accessing Kafka via HTTP
- Utilizing data compression in Kafka
Here we’ll shift from producers to the other half of the pipeline: consumers. We need to understand how they read at scale, coordinate, and stay correct. How are messages received and processed in parallel, how do applications subscribe to topics or explicitly position themselves in a stream, and how do batching and timeouts shape throughput and latency? To answer these key questions, we need to make consumer groups concrete, explore rebalances, and highlight the most common problems, which are lag, duplicates, and ordering. We also want to consider Confluent REST Proxy as a lightweight integration option. By the end of this chapter, you’ll know how to design consumer logic that is reliable, efficient, and easy to operate.
In our ODS example, we’ll focus on the next stage of the project: developing the consumer application. With topics set for customer profile data and transactions, the Customer 360 team will now delve into the intricacies of data aggregation, determining how best to handle and merge the incoming streams. After discussing the various trade-offs, the next step is to start designing the consumer application, focusing on data aggregation and processing it in the client code.