chapter two

2 Modeling and measuring latency

This chapter covers

Designing with laws of latency in mind
Thinking of latency as a distribution
Discovering common sources of latency
Understanding how latency compounds
Measuring latency correctly

In the previous chapter, we discussed what latency is and is not and why we care. Latency is the time delay between a cause and its observed effect, which is context-specific. For example, if you browse a web page, you’re generally interested in the request–response latency between the client and the server. For lower-level system components, you might be interested in the latency between a packet arriving from the network, the time it takes for the userspace application to process it, and so on. We also discussed how latency compares to bandwidth and throughput and the tradeoffs between latency and energy-efficiency.

In this chapter, we’ll focus on how to model and measure latency, which are essential when you’re building for low latency. First, we’ll look at two important principles related to latency: Little’s law and Amdahl’s law. They are theoretical concepts that provide practical insights into systems design and performance. Little’s law, guiding our system design, reveals the relationship between latency, throughput, and concurrency. On the other hand, Amdahl’s law explores the balance between latency and parallelism, showing the potential acceleration we can achieve by reducing latency or increasing parallel processing.

2.1 Laws of latency

2.1.1 Little’s law

2.1.2 Amdahl’s law

2.2 Latency distribution

2.3 Common sources of latency

2.3.1 Physics

2.3.2 CPU and hardware

2.3.3 Virtualization

2.3.4 Operating system, drivers, and firmware

2.3.5 Managed runtime

2.3.6 Application

2.4 Compounding latency

2.5 Measuring latency

2.6 Putting it together: Measuring network latency

2.6.1 Plotting with histograms

2.6.2 Plotting with eCDF

Summary