chapter five

5 Partitioning

This chapter covers

Understanding the benefits and downsides of partitioning
Partitioning strategies and how to choose one
Request routing when partitioning
Mitigating against skewed workloads and hot partitions

In the previous two chapters, we explored techniques for reducing latency with colocation and replication. Colocation places related computing, such as business logic and data resources, nearby, which minimizes the network distance between them. Colocation can improve latency by reducing communication latency between components. For example, an application that uses serverless functions for backend logic may benefit from a database that is colocated with the serverless runtime. Replication, on the other hand, is a technique for copying relevant data to multiple locations while maintaining consistency between the copies. With replication, you receive the same benefits as with colocation, but across various locations. However, replicating the entire dataset across numerous locations can be impractical due to storage costs and network bandwidth requirements. Additionally, maintaining consistency across multiple replicas can introduce significant coordination overhead, as every write operation may need to be synchronized across various locations, thereby creating latency bottlenecks rather than reducing them.

5.1 Why partition data?

5.2 Physical partitioning strategies

5.2.1 Horizontal partitioning

5.2.2 Vertical partitioning

5.2.3 Hybrid partitioning

5.3 Logical partitioning strategies

5.3.1 Functional partitioning

5.3.2 Geographical partitioning

5.3.3 User-based partitioning

5.3.4 Time-based partitioning

5.3.5 Overpartitioning

5.4 Request routing

5.4.1 Direct routing

5.4.2 Proxy routing

5.4.3 Forward routing

5.5 Partition imbalance

5.5.1 Hot partitions

5.5.2 Skewed workloads

5.6 Putting it together: Horizontal partitioning with SQLite