9 Using replication and sharding
This chapter covers
- Learning the MongoDB replica set concept
- Identifying replica set members
- Understanding the MongoDB oplog
- Tracking change streams
- Creating sharded clusters in Atlas
- Horizontal scaling with sharding
People often mix up replication and sharding, though they're different systems used in database management for distinct purposes. What's the difference? Replication involves copying data and operations from a primary server to secondary ones to enhance data availability. It's particularly useful for disaster recovery and for distributing read queries among multiple nodes to improve read performance and reduce load on the primary. However, all write operations still go through the primary server, which can become a bottleneck.
Conversely, sharding partitions a large database into smaller segments, known as shards, with each shard housing a fraction of the complete data set on its own database server instance. Because the entire dataset is distributed across multiple server instances, write operations affecting multiple shards can be handled by the corresponding primary server instances, reducing the write bottleneck. It's crucial that to preserve data integrity and availability, each shard should implement replication.