chapter three

3 Colocation

This chapter covers

Colocating code and data as a latency optimization
Optimizing for low latency in distributed systems
Optimizing for low latency in multicore systems

In chapters 1 and 2, we explored latency as a performance metric and how to model and measure it, essential prerequisites for optimizing for low latency. However, in this second part of the book, we’ll turn our focus to optimizing data-related latency.

The first data-related optimization we’ll look at is colocation. Consider the following scenario: your web application queries a database server located 6,000 kilometers (~3,700 miles) away, adding 60 ms of network round-trip latency to every request. By moving that database to the same data center as your application, you reduce query latency to under 10 ms, which is a 6x improvement through a simple change in placement.

Colocation works by minimizing the physical distance data must travel. If your application accesses a database, bringing the database closer makes data access faster. Taking this to the extreme, embedding a database directly in your application eliminates network communication entirely. Colocation also applies to system-to-system communication, as high-frequency traders discovered long ago. What they noticed is that placing their automated trading servers in the same data center as the exchange provides a crucial latency advantage over competitors located elsewhere.

So, with that in mind, let’s dive into colocation!

3.1 Why colocate?

3.2 Internode latency

3 Colocation

This chapter covers

3.1 Why colocate?

3.2 Internode latency

3.2.1 Geographical and last-mile latency

3.2.2 Edge computing and CDNs

3.3 Intranode latency

3.3.1 Network stack

3.3.2 TCP/IP protocol

3.3.3 Kernel-bypass networking

3.4 Multicore architecture

3.5 Putting it together: REST API with embedded database

Summary