5 Fault tolerance

 

This chapter covers

  • Self-healing systems and the let-it-crash principle
  • The actor lifecycle signals
  • Supervising strategies and their signals
  • Monitoring and watching

This chapter covers Akka’s tools for making applications more resilient. These tools, which follow the let-it-crash principle, are supervision, monitoring, and the actor lifecycle features. We look at examples that show how to apply them to typical failure scenarios.

NOTE

The source code for this chapter is available at www.manning.com/books/akka-in-action-second-edition or https://github.com/franciscolo pezsancho/akka-topics/tree/main/chapter05. You can find the contents of any snippet or listing in the .scala file with the same name as the class, object, or trait.

5.1 What fault tolerance is (and what it isn’t)

Let’s start with a definition of what we refer to here as a fault-tolerant system and why you’d write code to embrace the notion of failure. In an ideal world, a system is always available and can guarantee that it will be successful with each undertaken action. The only two paths to this ideal are using components that can never fail or accounting for every possible fault by providing a recovery action, which is also assured of success. In most architectures, what you have instead is a catch-all mechanism that terminates as soon as an uncaught failure arises.

5.1.1 Plain old objects and exceptions

5.1.2 Wrap it up and let it crash

5.2 Actor lifecycle events: Signals

5.3 Supervision strategies and signals

5.3.1 Uneventful resuming

5.3.2 Stopping and the PostStop signal

5.3.3 Restart and the PreRestart signal

5.3.4 Custom strategy

5.4 Watching signals from an actor

5.5 Back to the initial use case

Summary