Chapter 17. Micro-batch stream processing: Illustration

This chapter covers

Trident, Apache Storm’s micro-batch-processing API
Integrating Kafka, Trident, and Cassandra
Fault-tolerant task local state

In the last chapter you learned the core concepts of micro-batch processing. By processing tuples in a series of small batches, you can achieve exactly-once processing semantics. By maintaining a strong ordering on the processing of batches and storing the batch ID information with your state, you can know whether or not the batch has been processed before. This allows you to avoid ever applying updates multiple times, thereby achieving exactly-once semantics.

You saw how with some minor extensions pipe diagrams could be used to represent micro-batch streaming computations. These pipe diagrams let you think about your computations as if every tuple is processed exactly once, while they compile to code that automatically handles the nitty-gritty details of failures, retries, and all the batch ID logic.

Now you’ll learn about Trident, Apache Storm’s micro-batching API, which provides an implementation of these extended pipe diagrams. You’ll see how similar it is to normal batch processing. You’ll see how to integrate it with stream sources like Kafka and state providers like Cassandra.

Chapter 17. Micro-batch stream processing: Illustration

This chapter covers

17.1. Using Trident

17.2. Finishing the SuperWebAnalytics.com speed layer

17.3. Fully fault-tolerant, in-memory, micro-batch processing

17.4. Summary

Chapter 17. Micro-batch stream processing: Illustration

This chapter covers

17.1. Using Trident

17.2. Finishing the SuperWebAnalytics.com speed layer

17.3. Fully fault-tolerant, in-memory, micro-batch processing

17.4. Summary

Unable to load book!