12 Moving data in bulk with ScyllaDB
This chapter covers
- Reading an entire table efficiently via token-range queries
- Tracking changes to a table using change data capture
- Using tooling to migrate data into ScyllaDB
- Validating data via dual-reading to verify a data migration
Throughout the book, your queries have assumed the upstream user’s perspective: inserting a row at a time (or maybe multiple, if using a batch write) and reading a single partition of data. A database, however, isn’t only queried by a user-facing API. Your company may extract data for analysis into a centralized data warehouse, combining the Scylla data with other sources to query together. Alternatively, maybe you’re working to migrate some legacy data into Scylla; your options are wider than writing rows to two places.
In this final chapter, you’ll learn how to work with Scylla’s data at database scale. You’ll learn about the techniques that work best for bulk reading and writing data with Scylla and how you can apply them to your use cases. First, let’s examine how you can export data from Scylla through bulk reading and how to choose options to balance your reading speed while not breaking other traffic on the database.