Chapter 5 · System Design Fundamentals

Consistent Hashing

Sharding data and spreading cache keys across a fleet of servers is only easy until the fleet changes size. Consistent hashing is the technique that lets a cluster grow, shrink, or lose a node while moving only a small slice of the keys — and it is a building block that surfaces again and again in interviews.

▶ Open the companion slides

Reading time ~11 min Prerequisites Ch 4 Audio 🔊 Hinglish read-aloud Next Design a Key-Value Store

The crucial reframe: the hard part of distribution is not the first placement — it is the re-placement. Any hash function can spread keys evenly across a fixed set of servers. The question that separates a toy from a production system is what happens the moment a server is added, removed, or dies. Consistent hashing answers that question by minimizing how much the assignment has to change.

Primary source

Alex Xu, System Design Interview (Vol. 1), Chapter 5. Companion deck: slides — jump to any slide with the Slide N chips. Go deeper: the original idea comes from Karger et al., "Consistent Hashing and Random Trees" (STOC 1997).

The problem: mod-N breaks on resize Slide 2

The naive way to spread keys over N servers is server = hash(key) mod N. It is fast and even — until N changes. Add or remove a single node and the modulus changes for every key, so almost all of them land on a different bucket. For a cache that is a stampede of misses hitting the origin; for a database it is a wholesale rebalance and downtime.

server = hash(key) mod N → change N, and roughly (N−1)/N of all keys move

Under mod-N, a single new server reshuffles most of the cache at once — roughly (N−1)/N of all keys.

The fix: bend the hash space into a ring Slide 3

Pick a hash function with a large output space — say the 128-bit range of MD5. Lay that range end to end and join the two ends into a circle. Every possible hash value is now a single point on the ring, and servers and keys both map onto the same circle through the same hash function. Crucially, a server's spot depends only on its identifier, never on the count of nodes — so resizing no longer touches the formula.

One continuous, wrap-around coordinate system. No routing table — the hash function is the map.

Servers claim positions by hashing themselves Slide 4

Compute hash(A), hash(B), and so on for each node's stable identifier — its IP, hostname, or a synthetic label. Each output lands the node somewhere on the circle. By convention, a server owns the arc of the ring that ends at its position: the next server clockwise marks the boundary of its territory. Most implementations give every node a copy of the membership list, so any node can answer a lookup locally without a central coordinator.

A key belongs to the first server clockwise Slide 5

Hash the key, drop it onto the ring, then walk clockwise until you hit a server — that server owns the key. The walk is conceptual; in code it is a binary search over the sorted node positions, so lookup is O(log N). The same rule covers reads, writes, and replica placement, and there is no seam: a key past the last server simply wraps past 0 to the first.

Each arc is the territory of the server at its clockwise end. k1 walks past A's gap and stops at B.

Adding a node rearranges only one arc Slide 6

Suppose a new server E joins between B and C. Every key that used to walk past E to reach C now stops at E instead — and nothing else moves. Keys elsewhere on the ring keep the same successor. On average, joining a cluster of N nodes relocates only about 1/N of the keys, versus nearly all of them under mod-N.

Removing a node folds its arc into the next Slide 7

Removal is the mirror image. If a server fails or is decommissioned, every key it owned was, by definition, in the arc ending at its position. Those keys simply walk one step further to the next surviving server clockwise; no other arc is disturbed. A failure is just a node deletion — convenient, but, as the next section shows, also exactly where the trouble starts.

Add node E between B and C

Only the keys in the B → E arc reassign (from C to E). Everything else is untouched.

Remove node B

B's keys walk forward to C, the next server clockwise. A and D keep their arcs.

Net effect

A join or leave touches ≈ 1/N of the keyspace — not (N−1)/N.

Side by side: what moves when membership changes

The whole value of the ring is captured in one column — the fraction of keys that have to relocate when a single node joins or leaves.

Property	Mod-N hashing	Consistent hashing (ring)
Keys moved on a node join	≈ (N−1)/N	≈ 1/N
Keys moved on a node leave	≈ (N−1)/N	≈ 1/N
Routing table to maintain	none, but recompute on resize	none — the hash is the map
Lookup cost	O(1)	O(log N)
Even load by default	yes, for fixed N	no — needs virtual nodes

The catch: a small ring is lumpy Slide 8

A uniform hash spreads keys evenly only on average. With just a handful of randomly placed points on a huge circle, some arcs end up far wider than others — and the server at the clockwise end of a wide arc owns a disproportionate share of the keyspace and its traffic. Worse, when a node fails its entire arc lands on a single neighbor, whose load can instantly double and knock it over too.

Random is not even

With N = 4 or N = 8 real servers, variance dominates: gaps of 2× or 3× the mean are common, and skewed real-world traffic only compounds a hot arc. The fix is to put far more points on the ring.

Virtual nodes smooth the load Slide 9

Instead of placing each server once, place it dozens or hundreds of times. Each replica is a virtual node, generated by hashing labels like A#0, A#1, A#2, and so on — all still owned by the same physical server. The law of large numbers now works for us: each physical server ends up owning roughly the same total arc length, and when one dies, its many small arcs are absorbed by many different neighbors rather than one.

Each color is one physical server, scattered across the ring. More points → flatter load and failures that spread, not cascade.

Heterogeneous capacity, easily

Give a beefier machine more virtual nodes than a small one. Capacity is dialed in by changing a single integer per server.

The cost: a bigger ring

50 servers × 200 vnodes = 10,000 positions. Lookups stay logarithmic, but the sorted structure uses more memory on every node.

Where you'll meet it in real infra Slide 10

The pattern shows up wherever a fleet of servers must agree on which one owns which piece of data while the fleet churns underneath. Three families dominate.

Distributed caches

Clients hash request keys onto a ring of cache servers; adding or rebooting one box re-warms only its share, not the whole fleet. Memcached client libraries, Ketama, Twemproxy.

Partitioned databases

Each row's primary key is hashed onto a ring; the owning node holds the data and forwards copies to a few clockwise neighbors as replicas. Amazon Dynamo, Cassandra, Riak, ScyllaDB.

Edge and CDNs

An edge layer hashes an object's URL to pick which cache node serves it; nodes join and leave with most content staying put. Akamai-style edges, Cloudflare, Discord guild sharding.

The common thread

Stateless or partly-stateful servers, constant churn, and a need to keep most data exactly where it is during a rolling deploy or a single failure.

Tradeoffs: it solves one problem and adds smaller ones Slide 11

Adopting the ring is rarely an afternoon's work. None of the following are showstoppers, but they are the issues that consume the rest of the design.

Membership and gossip

Nodes must learn quickly when peers join or leave. Most clusters layer a gossip protocol on the ring to converge the membership view — an extra subsystem with its own failure modes.

Replica placement

The next clockwise nodes may share a rack, zone, or power domain. Production rules skip neighbors until rack and region diversity is satisfied.

Bounded load

Even with virtual nodes the spread is statistical and some shards run hot. Modern variants add per-node capacity caps and overflow to the next node.

Rebalance is data movement

Only a small share of keys move, but that share is still physically copied. Throttle the transfer so live reads and writes keep their bandwidth.

Active recall

Cover the answers. Say each one out loud before you tap to check.

Why does mod-N hashing break when the cluster resizes?

The modulus changes.

Changing N changes hash(key) mod N for almost every key, so roughly (N−1)/N of all keys remap at once — a cache stampede or a wholesale database rebalance.

What does "bend the hash space into a ring" mean?

Join the ends.

Take a large hash range, lay it end to end, and join it into a circle. Servers and keys both map onto the same ring via the same hash function; the node count never enters the formula.

How do you find which server owns a key?

One direction.

Hash the key, drop it on the ring, and walk clockwise to the first server you meet. In code it is a binary search over sorted node positions — O(log N).

How many keys move when a node joins or leaves?

Compare to mod-N.

On average about 1/N — only the affected arc rearranges. A join reassigns the new node's arc; a leave folds its arc into the next server clockwise.

Why are virtual nodes needed?

Few points are lumpy.

With few points the ring is uneven and one wide arc becomes a hot spot. Giving each server many positions evens out arc sizes and spreads a failure's load across many neighbors.

Name two systems built on consistent hashing.

Cache or DB.

Distributed caches (Memcached clients, Ketama), partitioned databases (Dynamo, Cassandra, Riak), and CDN/edge routing. Anywhere a churning fleet must agree on key ownership.

Check yourself

Q1 Why does hash(key) mod N behave badly on resize?

Why: Changing N changes the modulus for nearly every key, so about (N−1)/N of them land on a different bucket — a stampede or a full rebalance.

Q2 On a hash ring, which server owns a given key?

Why: Drop the key on the ring and walk clockwise; the first server you meet owns it. In practice a binary search over sorted positions.

Q3 Roughly how many keys move when one node joins the ring?

Why: Only the arc the new node lands in is reassigned, so on average about 1/N of keys move — versus (N−1)/N under mod-N.

Q4 What problem do virtual nodes primarily solve?

Why: Few points give lumpy arcs and hot spots. Many positions per server even out arc sizes and spread a failure across many neighbors.

Q5 When a node on the ring fails, what happens to its keys?

Why: Its arc folds into the next surviving server clockwise; every other arc is untouched. A failure is just a node deletion.