8 ScyllaDB’s architecture

 

This chapter covers

  • Scylla’s architectural goals
  • The distributed systems that Scylla utilizes to provide speed and fault tolerance
  • Scylla’s on-disk architecture
  • The interplay between these components when performing cluster operations

Thus far, the topics in this book have been very hands-on. You’ve built a cluster, designed a schema, and made queries against the database. This chapter takes a step back and fills in the context around Scylla’s behavior by examining its architecture. You’ll be able to answer questions like these:

  • How does a read find the data it needs?
  • What consistency options do you have, and which one should you use?
  • How does Scylla compact data?

In my experiences with running and operating Scylla, having a firm understanding of the database’s architecture is tremendously valuable in helping to reason about the database’s performance. I’ve tried to distill my biggest lessons learned throughout the chapter, but I have no doubt that the day after this book goes to the printer, I’ll learn something new about Scylla. It’s through understanding the database’s architecture that you can make a mental model of how it behaves and performs and how you can refine it when you learn something new.

8.1 Scylla’s design goals

8.2 Distributed systems in Scylla

8.2.1 Revisiting the hash ring

8.2.2 Consistency

8.2.3 Communication protocols

8.2.4 Gossip

8.2.5 Consensus

8.3 On-node architecture

8.3.1 The memtable and the commit log

8.3.2 Shards

8.3.3 SSTables

8.3.4 Tablets: The future

8.4 Cluster operations

8.4.1 Compaction

8.4.2 Repairs

8.4.3 Hinted handoff

Summary