Chapter 16. Micro-batch stream processing
This chapter covers
- Exactly-once processing semantics
- Micro-batch processing and its trade-offs
- Extending pipe diagrams for micro-batch stream processing
You’ve learned the main concepts of the speed layer in the last four chapters: realtime views, incremental algorithms, stream processing, and how all those fit together. There are no more fundamental concepts to learn about the speed layer—instead, in this chapter we’ll focus on a different method of stream processing that makes certain trade-offs to get benefits like improved accuracy and higher throughput.
The one-at-a-time stream processing you’ve learned is very low latency and simple to understand. But it can only provide an at-least-once processing guarantee during failures. Although this doesn’t affect accuracy for certain operations, like adding elements to a set, it does affect accuracy for other operations such as counting. In many cases, this inaccuracy is unimportant because the batch layer overrides the speed layer, making that inaccuracy temporary. But there are other cases where you want full accuracy all of the time, and temporary inaccuracy is unacceptable. In those cases, micro-batch stream processing can give you the fault-tolerant accuracy you need, at the cost of higher latency on the order of hundreds of milliseconds to seconds.