10 Distributed consensus

 

This chapter covers

  • Consensus
  • State machine replication
  • Raft consensus protocol

Distributed consensus is a foundational abstraction of distributed systems. Long believed impossible to achieve, distributed consensus serves as a cornerstone for building reliable and scalable distributed systems. Distributed consensus allows a group of redundant processes to advance in lockstep to act as one. This allows that, at any time, some processes in the group can compensate for the failure of others. Failing to reach consensus is often considered catastrophic for the application at hand. Did the transaction commit or abort? Did operation A happen before operation B? Was the lock acquired by component 1 or component 2? Any disagreement on these questions quickly results in incorrect behavior. Therefore, the consensus problem has garnered outsized interest in both the theoretical and practical realms of software engineering.

10.1 The challenge of reaching agreement

10.2 System model

10.3 State machine replication

10.4 The origin—and irony—of consensus

10.5 Implementing consensus

10.5.1 Leader-based consensus

10.5.2 Quorum-based consensus

10.5.3 Combining leader and quorum

10.6 Raft

10.6.1 The log

10.6.2 Terms

10.6.3 Leader Election Protocol

10.6.4 Log replication

10.6.5 State machine safety

10.7 Raft puzzles

10.7.1 Puzzle 1

10.7.2 Puzzle 2

10.7.3 Puzzle 3

10.8 Summary