chapter nine

9 Using replication and sharding

This chapter covers

Learning the MongoDB replica set concept
Identifying replica set members
Understanding the MongoDB oplog
Tracking change streams
Creating sharded clusters in Atlas
Horizontal scaling with sharding

People often mix up replication and sharding, though they're different systems used in database management for distinct purposes. What's the difference? Replication involves copying data and operations from a primary server to secondary ones to enhance data availability. It's particularly useful for disaster recovery and for distributing read queries among multiple nodes to improve read performance and reduce load on the primary. However, all write operations still go through the primary server, which can become a bottleneck.

Conversely, sharding partitions a large database into smaller segments, known as shards, with each shard housing a fraction of the complete data set on its own database server instance. Because the entire dataset is distributed across multiple server instances, write operations affecting multiple shards can be handled by the corresponding primary server instances, reducing the write bottleneck. It's crucial that to preserve data integrity and availability, each shard should implement replication.

9.1 Ensuring data high availability with replication

9.1.1 Distinguishing replica set members

9.1.2 Electing primary replica set member

9.1.3 Understanding oplog collection

9.2 Understanding change streams

9.2.1 Connections for a change stream

9.2.2 Change streams with Node.js

9.2.3 Modifying the output of a change stream

9.3 Scaling data horizontally through sharding

9.3.1 Viewing sharded cluster architecture

9.3.2 Creating sharded cluster via Atlas CLI

9.3.3 Working with shard key

9.3.4 Choosing a shard key

9.3.5 Using shard key analyzer

9.3.6 Detecting shard data imbalance or uneven data distribution

9.3.7 Resharding collection

9.3.8 Understanding chunks balancing

9.3.9 Administrating chunks

9.3.10 Automerging chunks

9.4 MongoDB 8.0 sharded cluster features

9.5 Managing data consistency and availability with Read/Write Concern and Read Preference

9.5.1 Write Concern

9.5.2 Read Concern

9.5.3 Read Preference

9.6 Summary