Chapter 17. Micro-batch stream processing: Illustration
This chapter covers
- Trident, Apache Storm’s micro-batch-processing API
- Integrating Kafka, Trident, and Cassandra
- Fault-tolerant task local state
In the last chapter you learned the core concepts of micro-batch processing. By processing tuples in a series of small batches, you can achieve exactly-once processing semantics. By maintaining a strong ordering on the processing of batches and storing the batch ID information with your state, you can know whether or not the batch has been processed before. This allows you to avoid ever applying updates multiple times, thereby achieving exactly-once semantics.
You saw how with some minor extensions pipe diagrams could be used to represent micro-batch streaming computations. These pipe diagrams let you think about your computations as if every tuple is processed exactly once, while they compile to code that automatically handles the nitty-gritty details of failures, retries, and all the batch ID logic.
Now you’ll learn about Trident, Apache Storm’s micro-batching API, which provides an implementation of these extended pipe diagrams. You’ll see how similar it is to normal batch processing. You’ll see how to integrate it with stream sources like Kafka and state providers like Cassandra.