Chapter 16. Micro-batch stream processing

This chapter covers

Exactly-once processing semantics
Micro-batch processing and its trade-offs
Extending pipe diagrams for micro-batch stream processing

You’ve learned the main concepts of the speed layer in the last four chapters: realtime views, incremental algorithms, stream processing, and how all those fit together. There are no more fundamental concepts to learn about the speed layer—instead, in this chapter we’ll focus on a different method of stream processing that makes certain trade-offs to get benefits like improved accuracy and higher throughput.

The one-at-a-time stream processing you’ve learned is very low latency and simple to understand. But it can only provide an at-least-once processing guarantee during failures. Although this doesn’t affect accuracy for certain operations, like adding elements to a set, it does affect accuracy for other operations such as counting. In many cases, this inaccuracy is unimportant because the batch layer overrides the speed layer, making that inaccuracy temporary. But there are other cases where you want full accuracy all of the time, and temporary inaccuracy is unacceptable. In those cases, micro-batch stream processing can give you the fault-tolerant accuracy you need, at the cost of higher latency on the order of hundreds of milliseconds to seconds.

Chapter 16. Micro-batch stream processing

This chapter covers

16.1. Achieving exactly-once semantics

16.2. Core concepts of micro-batch stream processing

16.3. Extending pipe diagrams for micro-batch processing

16.4. Finishing the speed layer for SuperWebAnalytics.com

16.5. Another look at the bounce-rate-analysis example

16.6. Summary