11 Durable executions

 

This chapter covers

  • Short-running versus long-running processes
  • Failure-free definitions
  • Failure-tolerant executions
  • Sagas versus durable executions

Durable executions, an emerging concept in software engineering, are to distributed systems what transactions are to databases: an abstraction concealing the possibility of failure.

“In the presence of partial failure, even the most basic rules for reasoning about computations do not hold.”

—An Equational Theory for Transactions (https://lampwww.epfl.ch/~cremet/publications/fsttcs03.pdf)

11.1 The Pitfalls of Partial Executions

Imagine a user registering for a streaming platform—for example, for video or music streaming. During the registration process, the platform handles the user's credit card payment and then grants access to its content library. Listing 11.1 displays the steps involved in the signup function.

Listing 11.1 User signup function
async function signup(user) {
 
  
  const charge = await Payment.create({
    ...
  });
 
  💀 #A
  
  const account = await Account.create({
    ...
  });
 
}

At first glance, the function may appear to be fine. However, upon closer inspection, we notice a problem: the function only handles the Happy Path, naively ignoring the possibility of failure. If the function crashes after charging the credit card but before updating the database; that is, if the function executes partially, the user will be charged but will not have access.

11.2 System model

11.2.1 Process definition

11.2.2 Process execution

11.3 The concept of failure-transparent recovery

11.4 Strategies of failure-transparent recovery

11.4.1 Restart

11.4.2 Resume

11.5 Implementation of failure-transparent recovery

11.5.1 Application-level implementation: Sagas

11.5.2 Platform-level implementation: Durable executions

11.6 Summary