This chapter covers
- Learning the MongoDB replica set concept
- Identifying replica set members
- Understanding the MongoDB oplog
- Tracking change streams
- Creating sharded clusters in Atlas
- Horizontal scaling with sharding
People often mix up replication and sharding, though they’re different systems used in database management for distinct purposes. What’s the difference? Replication involves copying data and operations from a primary server to secondary ones to enhance data availability. It’s particularly useful for recovering from disasters and distributing read queries among multiple nodes to improve read performance and reduce load on the primary. But all write operations still go through the primary server, which can become a bottleneck.
Conversely, sharding partitions a large database into smaller segments, known as shards, each housing a fraction of the complete data set on its own database server instance. Because the entire data set is distributed across multiple server instances, write operations affecting multiple shards can be handled by the corresponding primary server instances, reducing the write bottleneck. To preserve data integrity and availability, each shard must implement replication.