chapter three

3 Data modeling in ScyllaDB

This chapter covers

Performing query-first design
How ScyllaDB distributes data across the cluster
Implementing query-first design to build a schema for a sample application

Storing data is the easy part of running a database. ScyllaDB is designed to store data; databases in general are built to make storing and retrieving data easy. The hard part, I often find, is determining what data you need to store and how you store it in the database, doing so in a way that makes it easy to access.

Developing a database schema usually begins with an idea for an application. That application has requirements, and those requirements relate to the database, either directly or indirectly. You, and probably others as well, iterate on those requirements until you have something that most likely needs data to live in a database. You then take the requirements and translate them into a database schema.

In chapter 1, you learned that ScyllaDB is a different database—it distributes data across multiple nodes to provide better scalability and fault tolerance. In chapter 2, you learned that Scylla achieves this distribution by partitioning the data based on a partition key derived from the primary key of a row. This relationship implies that you are responsible for achieving ScyllaDB’s benefits: if you design your data to be effectively distributed, that distribution positively affects the database.

3.1 Application design before schema design

3.1.1 Your query-first design toolbox

3 Data modeling in ScyllaDB

This chapter covers

3.1 Application design before schema design

3.1.1 Your query-first design toolbox

3.1.2 The sample application requirements

3.1.3 Determining the queries

3.2 Identifying tables

3.2.1 Denormalization

3.2.2 Extracting tables

3.3 Distributing data efficiently on the hash ring

3.3.1 The hash ring

3.3.2 Making good partitions

Summary