chapter nine

9 Using replication and sharding

This chapter covers

Learning the MongoDB replica set concept
Identifying replica set members
Understanding the MongoDB oplog
Tracking change streams
Creating sharded clusters in Atlas
Horizontal scaling with sharding

People often mix up replication and sharding, though they’re different systems used in database management for distinct purposes. What’s the difference? Replication involves copying data and operations from a primary server to secondary ones to enhance data availability. It’s particularly useful for recovering from disasters and distributing read queries among multiple nodes to improve read performance and reduce load on the primary. But all write operations still go through the primary server, which can become a bottleneck.

Conversely, sharding partitions a large database into smaller segments, known as shards, each housing a fraction of the complete data set on its own database server instance. Because the entire data set is distributed across multiple server instances, write operations affecting multiple shards can be handled by the corresponding primary server instances, reducing the write bottleneck. To preserve data integrity and availability, each shard must implement replication.

9.1 Ensuring data high availability with replication

9.1.1 Distinguishing replica set members

9.1.2 Electing primary replica-set member

9.1.3 Understanding the oplog collection

9.2 Understanding change streams

9.2.1 Connections for a change stream

9.2.2 Changing streams with Node.js

9.2.3 Modifying the output of a change stream

9.3 Scaling data horizontally through sharding

9.3.1 Viewing sharded cluster architecture

9.3.2 Creating sharded clusters via Atlas CLI

9.3.3 Working with a shard key

9.3.4 Choosing a shard key

9.3.5 Using a shard-key analyzer

9.3.6 Detecting shard-data imbalance or uneven data distribution

Summary