chapter eight
                    This chapter covers
- Understanding synchronization and mutual exclusion
 - Working with atomics and memory barriers
 - Building your own wait-free data structures
 
In the previous chapter, we explored typical sources of redundant work and strategies to eliminate them, thereby reducing latency. However, optimizing the use of a single CPU will only sometimes suffice to meet stringent latency requirements. In such cases, using the parallelism offered by multiple CPUs becomes crucial. If your application allows for data partitioning—a technique discussed in chapter 5 that involves dividing data into independent chunks—you can scale your performance by adding more CPUs. This approach can significantly optimize latency in many use cases and workloads.