Chapter 9. Sharding

 

In this chapter

  • Sharding concepts
  • Setting up and loading a sample shard cluster
  • Administration and failover

MongoDB was designed from the start to support sharding. This has always been an ambitious goal because building a system that supports automatic range-based partitioning and balancing, with no single point of failure, is hard. Thus the initial support for production-level sharding was first made available in August 2010 with the release of MongoDB v1.6. Since then, numerous improvements have been made to the sharding subsystem. Sharding effectively enables users to keep large volumes of data evenly distributed across nodes and to add capacity as needed. In this chapter, I’ll present the layer that makes this possible in all its glory.

We’ll begin with a sharding overview, discussing what sharding is, why it’s important, and how it’s implemented in MongoDB. Although this will give you a basic working knowledge of sharding, you won’t fully understand it until you set up your own sharded cluster. That’s what you’ll do in the second section, where you’ll build a sample cluster to host data from a massive Google Docs-like application. We’ll then discuss some sharding mechanics, describing how queries and indexing work across shards. We’ll look at the ever-important choice of shard key. And I’ll end the chapter with a lot of specific advice on running sharding in production.

9.1. Sharding overview

9.2. A sample shard cluster

9.3. Querying and indexing a shard cluster

9.4. Choosing a shard key

9.5. Sharding in production

9.6. Summary