chapter three

3 Scaling databases

 

This chapter covers:

  • Various types of storage services.
  • Replicating databases.
  • Aggregating events to reduce database writes.
  • Normalization vs denormalization.
  • Caching frequent queries in memory.

In this chapter, we discuss concepts in scaling databases, their tradeoffs, and common databases that utilize these concepts in their implementation. We consider these concepts when choosing databases for various services in our system. 

3.1 Brief prelude on storage services

Storage services are stateful services. Compared to stateless services, stateful services have mechanisms to ensure consistency, and require redundancy to avoid data loss. A stateful service may choose mechanisms like Paxos for strong consistency or eventual-consistency mechanisms. These complex decisions and tradeoffs have to be made, and they depend on the various requirements like consistency, complexity, security, latency, performance.  This is one reason we keep all services stateless as much as possible, and keep state only in stateful services.

3.2 When to use vs avoid databases

3.3 Replication

3.3.1 Distributing replicas

3.3.2 Single-leader replication

3.3.3 Multi-leader replication

3.3.4 Leaderless replication

3.3.5 HDFS replication

3.3.6 Further reading

3.4 Scaling storage capacity with sharded databases 

3.5 Aggregating events 

3.5.1 Single-tier aggregation

3.5.2 Multi-tier aggregation    

3.5.3 Partitioning 

3.5.4 Handling a large key space 

3.5.5 Replication and fault-tolerance 

3.6 Batch and streaming ETL

3.6.1   A simple batch ETL pipeline

3.6.2 Messaging terminology

3.6.3 Kafka vs RabbitMQ