Chapter 17. Micro-batch stream processing: Illustration

 

This chapter covers

  • Trident, Apache Storm’s micro-batch-processing API
  • Integrating Kafka, Trident, and Cassandra
  • Fault-tolerant task local state

In the last chapter you learned the core concepts of micro-batch processing. By processing tuples in a series of small batches, you can achieve exactly-once processing semantics. By maintaining a strong ordering on the processing of batches and storing the batch ID information with your state, you can know whether or not the batch has been processed before. This allows you to avoid ever applying updates multiple times, thereby achieving exactly-once semantics.

You saw how with some minor extensions pipe diagrams could be used to represent micro-batch streaming computations. These pipe diagrams let you think about your computations as if every tuple is processed exactly once, while they compile to code that automatically handles the nitty-gritty details of failures, retries, and all the batch ID logic.

Now you’ll learn about Trident, Apache Storm’s micro-batching API, which provides an implementation of these extended pipe diagrams. You’ll see how similar it is to normal batch processing. You’ll see how to integrate it with stream sources like Kafka and state providers like Cassandra.

17.1. Using Trident

 
 
 

17.2. Finishing the SuperWebAnalytics.com speed layer

 
 

17.3. Fully fault-tolerant, in-memory, micro-batch processing

 

17.4. Summary

 
 
 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest