A scheme for spreading keys across a changing set of servers — so that adding or losing a node shifts only a small slice of the data, rather than every key in the system.
If you map keys to servers with server = hash(key) mod N, the assignment is fast and even — until N changes. Adding or removing a single node reshuffles nearly every key. For a cache, that is a stampede; for a database, that is downtime.
Each key is hashed to a number, then taken modulo the server count. With 4 servers, key k lives on bucket hash(k) % 4.
Move from 4 servers to 5 and the modulus changes. Roughly (N-1)/N of all keys land on a different bucket. At 4→5 that is about 80% of the cache invalidated at once.
Origin servers see a thundering herd. Warm caches go cold. Replicated data must be rebalanced wholesale. The cluster takes minutes or hours to recover.
Pick a hash function with a large output space — for instance the 128-bit space of MD5. Imagine its full range laid end to end and joined into a circle. Every possible hash value is a single point on that circle. Servers and keys both live on the same ring.
Both nodes and keys map to positions on the ring through the same hash function. There is no separate routing table to maintain.
A server's spot on the ring depends only on its identifier — its IP, hostname, or a chosen label. The total number of nodes never enters the calculation.
Typical hash spaces hold 2^32 or 2^128 positions. A real cluster of dozens of servers occupies a tiny, scattered fraction of the circle.
Compute hash(server-A), hash(server-B), and so on. The output lands each node somewhere on the ring. Because the hash function distributes outputs roughly uniformly, the nodes scatter around the circle — though, as we will see, never perfectly evenly.
The label hashed is up to you — IP address, DNS name, or a synthetic ID. What matters is that it is stable; the same node always lands in the same spot.
By convention, a server is responsible for the segment of the ring ending at its position. The next server clockwise marks the boundary of its territory.
With only a handful of points, hash randomness leaves big arcs and small arcs. Some servers naturally end up with more of the ring than others — a problem we will address shortly.
In most implementations, every node carries a copy of the membership list. Lookups can be answered locally without a central coordinator.
Hash the key. Drop it onto the ring at that position. Then move in the agreed direction — clockwise by convention — until you encounter a server. That server owns the key. The walk is purely conceptual; in practice it is a binary search over the sorted list of node positions.
Reads, writes, and replica placement all use the same rule. There is no separate routing table that must stay in sync with membership.
Each node holds the sorted positions of all peers. Finding the successor of a key's hash is a binary search — fast even with thousands of virtual nodes.
If a key's hash falls past the last server's position, the walk continues past 0 and lands on the first server. The ring has no seam.
Suppose a new server E joins between B and C. Every key that previously walked past E to reach C now stops at E instead. Keys living anywhere else on the ring are untouched — their successor server is unchanged.
Keys k1, k2, k3 all walk clockwise to land on C.
k1, k2, k3 now stop at E. All other keys keep the same owner.
On average, joining a cluster of N nodes moves only about 1/N of the keys — compared to nearly all of them under mod-N hashing.
If a server fails or is decommissioned, every key it owned was, by definition, in the arc ending at its position. Those keys are simply walked one step further around the ring — to the next surviving server. No other arc is disturbed.
k1 and k2 fall in the arc owned by B.
k1 and k2 walk past where B was and stop at C. A and D are untouched.
Failure is just a node deletion. The neighbor inherits the load — which is convenient and, as we will see next, also exactly where the trouble starts.
With only a handful of randomly placed points on a huge circle, some arcs are inevitably much wider than others. The server at the clockwise end of a wide arc owns a disproportionate share of the keyspace — and therefore traffic. Worse, when a node fails, its entire arc falls onto a single neighbor, which may already be busy.
A uniform hash function spreads on average. With only N=4 or N=8 actual servers, the variance dominates — gaps of 2× or 3× the mean are common.
When a server dies, the whole of its arc lands on a single neighbor. That neighbor's load instantly doubles, which can knock it over too.
Real traffic is rarely uniform across the keyspace either. A hot arc on a small ring is the worst of both worlds.
Increasing the number of positions per server smooths the distribution. That is the idea behind virtual nodes.
Instead of placing each server once, place it dozens, hundreds, or even thousands of times. Each replica is a virtual node, generated by hashing labels like A#0, A#1, A#2, and so on. The physical server still owns each of those positions, but the law of large numbers now works in our favor.
With many replicas per node, each physical server ends up owning roughly the same total arc length — even though individual arcs vary.
When a physical node dies, its many virtual positions vanish from the ring. Their arcs are absorbed by many different neighbors, not just one.
A beefier machine can be given more virtual nodes than a smaller one. Capacity is dialed in by changing a single integer per server.
A cluster of 50 servers with 200 vnodes each has 10,000 positions. Lookups are still logarithmic, but the structure consumes more memory on every node.
The pattern shows up wherever a fleet of stateless or partly-stateful servers must agree on which one is responsible for which piece of data, with churn happening underneath. The same hash-ring idea appears in caches, in databases, and at the edges of content networks.
Cache clients hash request keys onto a ring of cache servers. When a cache box is added or rebooted, only its share of the keyspace re-warms instead of the whole fleet — which is the difference between a hiccup and an outage at the origin.
Each row's primary key is hashed onto a ring of storage nodes. The owning node holds the data and forwards copies to a small fixed number of clockwise neighbors as replicas. Adding capacity moves only the affected partitions.
A CDN's edge layer hashes the URL of an object to pick which cache node should serve it. As nodes join and leave the pool, the bulk of cached content stays exactly where it was — keeping hit rates high even during rolling deploys.
Adopting the ring is rarely a single afternoon's work. Membership has to be tracked, replicas have to be placed thoughtfully, and rebalancing has to be paced. None of these are showstoppers — but they are the issues that consume the rest of the design.
Every node must learn quickly when peers join or leave. Most clusters layer a gossip protocol on top of the ring to converge the membership view. That is an extra subsystem with its own failure modes.
Putting copies on the next clockwise nodes is simple, but those nodes may share a rack, a zone, or a power domain. Production systems extend the basic rule to skip over neighbors until rack and region diversity is satisfied.
Even with virtual nodes, the distribution is statistical. Some shards run hot. Modern variants add capacity caps on each node and overflow to the next — at the cost of a slightly more complex lookup.
Adding a node only shifts a small share of keys — but that share still has to be physically copied. The transfer must be throttled so that ongoing reads and writes are not starved of bandwidth.
Wrap the output of your hash function end to end. Servers and keys both live as points on the same circle.
One rule for placement, lookup, and replica selection. No external routing table to keep in sync.
On average, about 1/N of the keys move when a node arrives or departs — versus nearly all of them under mod-N.
Replicating each physical node hundreds of times around the ring evens out arc sizes and spreads failures across many neighbors.
Walk the ring to choose replicas, but skip over peers that share a rack or zone with one already chosen.
The math is gentle, but the network is not. Throttle transfers so live traffic keeps its share of the pipe.