12 Moving data in bulk with ScyllaDB

 

This chapter covers

  • Reading an entire table efficiently via token-range queries
  • Tracking changes to a table using change data capture
  • Using tooling to migrate data into ScyllaDB
  • Validating data via dual-reading to verify a data migration

Throughout the book, your queries have assumed the upstream user’s perspective: inserting a row at a time (or maybe multiple, if using a batch write) and reading a single partition of data. A database, however, isn’t only queried by a user-facing API. Your company may extract data for analysis into a centralized data warehouse, combining the Scylla data with other sources to query together. Alternatively, maybe you’re working to migrate some legacy data into Scylla; your options are wider than writing rows to two places.

In this final chapter, you’ll learn how to work with Scylla’s data at database scale. You’ll learn about the techniques that work best for bulk reading and writing data with Scylla and how you can apply them to your use cases. First, let’s examine how you can export data from Scylla through bulk reading and how to choose options to balance your reading speed while not breaking other traffic on the database.

12.1 Extracting data from ScyllaDB

12.1.1 Using token ranges

12.1.2 Change data capture

12.2 Migrating to ScyllaDB

12.2.1 Dual writing

12.2.2 SSTableLoader

12.2.3 Spark Migrator

12.2.4 Writing a migrator

12.2.5 Validating migrations

Summary