Chapter 4. Creating robust topologies


This chapter covers

  • Guaranteed message processing
  • Fault tolerance
  • Replay semantics

So far, we’ve defined many of Storm’s core concepts. Along the way, we’ve implemented two separate topologies, each of which runs in a local cluster. This chapter is no different in that we’ll be designing and implementing another topology for a new scenario. But the problem we’re solving has stricter requirements for guaranteeing tuples are processed and fault tolerance is maintained. To help us meet these requirements, we’ll introduce some new concepts related to reliability and failure. You’ll learn about the tools Storm gives us to handle failure, and we’ll also dive into the various types of guarantees we can make about processing data. Armed with this knowledge, we’ll be ready to venture out into the world and create production-quality topologies.

4.1. Requirements for reliability

In the previous chapter, our heat map application needed to quickly process a large amount of time-sensitive data. Further, merely sampling a portion of that data could provide us with what we needed: an approximation of the popularity of establishments within a given geographic area right now. If we failed to process a given tuple within a short time window, it lost its value. The heat map was all about right now. We didn’t need to guarantee that each message was processed—most was good enough.

4.2. Problem definition: a credit card authorization system

4.3. Basic implementation of the bolts

4.4. Guaranteed message processing

4.5. Replay semantics

4.6. Summary