2 Review of hash tables and modern hashing

 

This chapter covers

  • Reviewing dictionaries and why hashing is ubiquitous in modern systems
  • Refreshing the basic collision-resolution techniques
  • Exploring cache efficiency in hash tables
  • Using hash tables for distributed systems and consistent hashing
  • Learning how consistent hashing works in P2P networks

We begin with the topic of hashing for a number of reasons. First, classical hash tables have proved irreplaceable in modern systems, making it harder to find a system that does not use them than one that does. Second, recently there has been a lot of innovative work addressing algorithmic issues that arise as hash tables grow to fit massive data, such as efficient resizing, compact representation, and space-saving tricks. In a similar vein, hashing has over time been adapted to serve in massive peer-to-peer systems where the hash table is split among servers; here, the key challenge is the assignment of resources to servers and the load balancing of resources as servers dynamically join and leave the network. Lastly, we begin with hashing because it forms the backbone of all the succinct data structures we present in part 1 of the book.

2.1 Ubiquitous hashing

2.2 A crash course on data structures

2.3 Usage scenarios in modern systems

2.3.1 Deduplication in backup/storage solutions

2.3.2 Plagiarism detection with MOSS and Rabin-Karp fingerprinting

2.4 O(1)—What's the big deal?

2.5 Collision resolution: Theory vs. practice

2.6 Usage scenario: How Python’s dict does it

2.7 MurmurHash

2.8 Hash tables for distributed systems: Consistent hashing

2.8.1 A typical hashing problem

2.8.2 Hashring

2.8.3 Lookup

2.8.4 Adding a new node/resource