3 Data modeling in ScyllaDB
This chapter covers
- Performing query-first design
- How ScyllaDB distributes data across the cluster
- Implementing query-first design to build a schema for a sample application
Storing data is the easy part of running a database. ScyllaDB is designed to store data; databases in general are built to make storing and retrieving data easy. The hard part, I often find, is determining what data you need to store and how you store it in the database, doing so in a way that makes it easy to access.
Developing a database schema usually begins with an idea for an application. That application has requirements, and those requirements relate to the database, either directly or indirectly. You, and probably others as well, iterate on those requirements until you have something that most likely needs data to live in a database. You then take the requirements and translate them into a database schema.
In chapter 1, you learned that ScyllaDB is a different database—it distributes data across multiple nodes to provide better scalability and fault tolerance. In chapter 2, you learned that Scylla achieves this distribution by partitioning the data based on a partition key derived from the primary key of a row. This relationship implies that you are responsible for achieving ScyllaDB’s benefits: if you design your data to be effectively distributed, that distribution positively affects the database.