chapter eight

8 Wait-free synchronization

This chapter covers

Understanding synchronization and mutual exclusion
Working with atomics and memory barriers
Building your own wait-free data structures

In the previous chapter, we explored typical sources of redundant work and strategies to eliminate them, thereby reducing latency. However, optimizing the use of a single CPU will only sometimes suffice to meet stringent latency requirements. In such cases, using the parallelism offered by multiple CPUs becomes crucial. If your application allows for data partitioning—a technique discussed in chapter 5 that involves dividing data into independent chunks—you can scale your performance by adding more CPUs. This approach can significantly optimize latency in many use cases and workloads.

8.1 Mutual exclusion

8.1.1 Mutexes

8.1.2 Read–write locks

8.1.3 Spinlocks

8.2 Problems with mutual exclusion

8.2.1 Inefficiency

8.2.2 Priority inversion

8.2.3 Convoying

8.2.4 Deadlocks

8.3 Atomics

8.3.1 Atomic operations

8.3.2 Anatomy of a spinlock

8.4 Memory barriers

8.4.1 Types of memory barriers

8.4.2 Compiler barriers

8.4.3 Memory reordering example

8.5 Wait-free synchronization

8.5.1 Progress conditions

8.5.2 Consensus number

8.5.3 Wait-free queues

8.5.4 Wait-free stacks

8.5.5 Wait-free linked lists

8.6 Putting it together: Building a single-producer, single-consumer queue