chapter twelve

12 Optimizations

This chapter covers

Explaining the concept of mechanical sympathy and applying it to low levels such as CPU caches
Understanding the core differences between heap and stack and how to reduce allocations
Using standard Go diagnostics tooling
Delving into how the Garbage Collector (GC) works
Discussing the impacts of running Go inside of Docker and Kubernetes

Before delving into this chapter, one disclaimer; in most contexts, writing readable and clear code is better than an optimized but more complex code to understand. Indeed, optimization comes generally with a price, and we advocate the reader to follow this famous quote from Wes Dyer:

	Make it correct, make it clear, make it concise, make it fast, in that order.
	-- Wes Dyer

That being said, it doesn’t necessarily mean that optimizing an application for speed and efficiency is necessarily something to prohibit. For example, we can try to identify code paths that need to be optimized because there’s a need for it such as making our customers happy or reducing our costs for example. Throughout this chapter, we will discuss common optimization techniques; some of them will be specific to Go, and some won’t. Furthermore, we will also discuss methods to identify the bottlenecks so that we don’t work blindly.

12.1 #91 - Not understanding CPU caches

Mechanical sympathy is a term coined by Jackie Steward, a three-time F1 world champion:

12.1.1 CPU architecture

12.1.2 Cache line

12.1.3 Slice of structs vs. struct of slices

12.1.4 Predictability

12 Optimizations

This chapter covers

12.1 #91 - Not understanding CPU caches

12.1.1 CPU architecture

12.1.2 Cache line

12.1.3 Slice of structs vs. struct of slices

12.1.4 Predictability

12.1.5 Cache placement policy

12.2 #92 - Writing concurrent code leading to false sharing

12.3 #93 - Not taking into account instruction-level parallelism

12.4 #94 - Not being aware of data alignment

12.5 #95 - Not understanding stack vs. heap

12.5.1 Stack vs. heap

12.5.2 Escape analysis

12.6 #96 - Not knowing how to reduce allocations

12.6.1 API change

12.6.2 Compiler optimizations

12.6.3 `sync.Pool`

12.7 #97 - Not relying on inlining

12 Optimizations

This chapter covers

12.1 #91 - Not understanding CPU caches

12.1.1 CPU architecture

12.1.2 Cache line

12.1.3 Slice of structs vs. struct of slices

12.1.4 Predictability

12.1.5 Cache placement policy

12.2 #92 - Writing concurrent code leading to false sharing

12.3 #93 - Not taking into account instruction-level parallelism

12.4 #94 - Not being aware of data alignment

12.5 #95 - Not understanding stack vs. heap

12.5.1 Stack vs. heap

12.5.2 Escape analysis

12.6 #96 - Not knowing how to reduce allocations

12.6.1 API change

12.6.2 Compiler optimizations

12.6.3 sync.Pool

12.7 #97 - Not relying on inlining

12.6.3 `sync.Pool`