Chapter 7 · System Design Fundamentals

Distributed Unique ID Generator

Every distributed system needs IDs that are unique across hundreds of machines without funnelling every write through a single bottleneck. This is a compact, classic interview design — and the answer is mostly four numbers packed into one 64-bit integer.

Open the companion slides
Reading time ~10 min Prerequisites Ch 6, Ch 5 Audio 🔊 Hinglish read-aloud Next Design a URL Shortener

The crucial reframe: uniqueness can live in the structure of the ID, not in a server that hands them out. Once you stop asking a central counter for the next number, the single point of failure disappears, the network round trip disappears, and a few hundred machines can each mint millions of IDs a second that never collide.

Primary source

Alex Xu, System Design Interview (Vol. 1), Chapter 7. Companion deck: slides — jump to any slide with the Slide N chips. Go deeper: the original Twitter Snowflake source.

What we need from an ID Slide 2

Before picking a scheme, pin down what "good" looks like. Five properties shape every decision that follows — and the tension between them is the whole problem.

1 · Globally unique

No two machines, threads, or days may ever produce the same ID — not even by accident on a clock rewind.

2 · Roughly time-ordered

Newer IDs should sort after older ones, so a B-tree index stays happy and you can paginate by ID.

3 · Compact (64-bit)

64 bits fits a BIGINT column and a register; a 128-bit ID doubles storage on every row and index page.

4 · High throughput

Tens of thousands of IDs per second per node, sub-millisecond — no round trip to a central server on the hot path.

The fifth requirement is the one that kills the obvious answers: no single point of failure. If one node dies, the rest keep minting through partial outages, partitions, and rolling restarts.

Option 1 — Database auto-increment Slide 3

The default in every relational database: one table, one counter, one row at a time. Every INSERT grabs the next integer; the transaction log guarantees uniqueness. It is trivial to build and the IDs are dense, sorted, and small — and it works right up until you need a second machine.

Why it doesn't scale

The whole world serialises on one counter row, so write throughput caps at a single primary. Sharding the database breaks the global counter, and adding a coordinator just re-introduces the same bottleneck. There is also a network round trip on every insert that you can never optimise away. Fine for a side project; fatal across datacenters.

Option 2 — UUID v4, pure randomness Slide 4

Flip the strategy entirely: generate 128 random bits on each client. No coordination, no central server, no network call — collision probability is so small it might as well be zero. The cost is everything that randomness destroys.

The win — zero coordination

Generate offline, on a phone, in a serverless function. No SPOF and no central authority, because no two generators ever talk to each other.

The cost — 128 bits, no order

Twice the storage of a BIGINT. And the randomness means you cannot sort by ID to get time order — inserts hit random index pages and split them constantly.

UUID v7 fixes most of this by prefixing a millisecond timestamp — essentially Snowflake's idea wearing 128-bit clothing.

Option 3 — Ticket server Slide 5

Flickr's classic move: one tiny dedicated service whose only job is to hand out the next integer. Every app server asks it for an ID over the network. The IDs come back dense, monotonic, and small — and the logic is just an atomic counter, dead simple to audit.

But that single counter is the single point of failure. Every ID costs a network round trip, throughput is capped by one machine's increment rate, and failover is genuinely hard: a hot standby must never hand out a number the primary already issued, even after a network partition. You can batch out ranges to soften the latency, but the SPOF remains.

Option 4 — Snowflake, the winner Slide 6

Twitter's insight: if we slice a 64-bit integer into pre-agreed fields, every machine can mint IDs locally and still guarantee global uniqueness — no network call, no central counter. Two machines never collide because their machine-ID bits differ; one machine never collides with itself because either the millisecond advanced or its sequence counter did. And because the timestamp sits in the high bits, sorting by integer value sorts by time, for free.

no coordinator — each node mints locally Node · machine 1 ts · 1 · seq++ Node · machine 2 ts · 2 · seq++ Node · machine 3 ts · 3 · seq++ globally unique, time-sortable 64-bit IDs distinct machine-ID bits → no collisions, ever
Each node carries a distinct machine ID, so independently-minted IDs can never collide — no coordinator sits in the hot path.

The 64-bit layout Slide 7

One Snowflake ID is a signed 64-bit integer split into four fields. The width of each block below is proportional to its bit count.

bit 63 bit 0 0 TIMESTAMP 41 bits · ms since custom epoch 5 bit DC 5 bit m/c SEQUENCE 12 bits sign · always 0 ~69 years of values 2^41 ms ≈ 69.7 years 1,024 nodes 5 dc + 5 worker 4,096 / ms resets per ms EXAMPLE 0 | 0001011001…0110101 | 00011 00101 | 000000010111 = one big positive long
1 + 41 + 10 + 12 = 64. Time in the high bits makes the integer sort chronologically; the machine-ID bits make every node's stream disjoint.
per node = 4096 × 1000 ms = 4,096,000 IDs/sec
whole fleet = 4,096,000 × 1024 nodes = ≈ 4.2 billion IDs/sec

Failure modes: clocks and overflow Slide 8

Two hazards haunt every Snowflake implementation, and an interviewer will reach for both.

Sequence overflow (a burst)

If a node tries to mint more than 4,096 IDs in one millisecond, the 12-bit sequence is exhausted. The generator busy-waits for the next millisecond — usually a few hundred nanoseconds — then resumes from sequence 0. Invisible to callers.

Clock moving backwards

NTP slews, leap seconds, and VM migrations can pull the wall clock backwards. Naively reusing the smaller timestamp would mint IDs that look older than ones already issued — and risk duplicates.

Clock-rewind hazard

Remember the last timestamp used. If now < lastTimestamp, either wait for the clock to catch up (small skews) or refuse to generate and alert (large skews). Run NTP in slew-only mode, alarm when the offset crosses a few hundred milliseconds, and prefer a monotonic clock where the runtime exposes one. Gaps are cheap; duplicates are catastrophic.

Alternative — DB-segmented ranges Slide 9

A pragmatic middle ground used at Instagram and Flickr. The database hands out a block of N IDs (say 1,000) to each node, and the node serves them locally — one DB round trip per thousand IDs instead of per ID. The hot path is local and the resulting IDs are dense, sortable BIGINTs.

round trips
1 / 1000
per block, not per ID
on crash
gaps
unused range is lost
allocator
soft SPOF
touched rarely · replicate it
ordering
approx
node A on 1,800, B on 2,500

A crashed node loses the unused IDs in its block (gaps appear, which is fine), and the fleet is only roughly time-ordered — node A may still be on 1,800 while node B is already on 2,500.

Four approaches side by side Slide 10

The same options scored on the requirements from the top — coordination, sortability, size, and the all-important single point of failure.

ApproachCoordinationSortable by time?SizeSingle point of failure?
Auto-incrementDB row lockYes64 bitsYes — the one primary
UUID v4NoneNo — random128 bitsNo
Ticket serverCentral RPCYes64 bitsYes — the counter
SnowflakeMachine-ID onlyYes64 bitsNo

Rule of thumb: a handful of nodes over an existing SQL stack — segmented ranges are the cheapest win. Hundreds of producers wanting zero hot-path coordination — ship Snowflake.

Principles to carry forward Slide 11

Coordinate as little as possible

Every network hop on the ID path is latency you never get back. Push uniqueness into the structure of the ID itself.

Put time in the high bits

Sortable IDs make indexes, pagination, and range scans easier — for free, just by where the timestamp lives.

Budget your bit width

Decide how many years, how many machines, and how many IDs per millisecond you actually need — then size the fields.

Treat clocks as adversarial

Wall clocks drift, slew, and occasionally reverse. Detect regression, refuse to mint, and alert loudly. Optimise for the right failure mode — gaps over duplicates.

Active recall

Cover the answers. Say each one out loud before you tap to check.

Name the five requirements for a distributed ID.
One of them kills the easy answers.
Globally unique, roughly time-ordered, compact (64-bit), high throughput, and no single point of failure.
Why does DB auto-increment fail across datacenters?
Everything funnels through one place.
The whole fleet serialises on one counter row, so throughput caps at a single primary. Sharding breaks the global counter, and a coordinator just re-creates the same bottleneck.
UUID v4 needs no coordinator — so what's the catch?
Two costs.
It is 128 bits (double a BIGINT) and pure randomness, so you can't sort by ID to get time order and inserts split random index pages.
Lay out the 64 Snowflake bits.
It adds to 64.
1 sign + 41 timestamp + 10 machine (5 datacenter + 5 worker) + 12 sequence.
What happens when the sequence overflows in one millisecond?
The 12-bit field maxes out.
After 4,096 IDs in a millisecond, the node busy-waits for the next millisecond, then resumes from sequence 0. A few hundred nanoseconds, invisible.
The clock jumps backwards. What do you do?
Compare against the last timestamp.
If now < lastTimestamp, wait for it to catch up (small skew) or refuse and alert (large skew). Never reuse a regressed timestamp — duplicates are catastrophic, gaps are cheap.

Check yourself

Q1 Which requirement rules out a single ticket server?
Why: The lone counter is exactly the single point of failure — every ID flows through it, and its failover is hard to make safe.
Q2 How are the 64 bits of a Snowflake ID divided?
Why: One sign bit, a 41-bit timestamp, 10 machine-ID bits, and a 12-bit sequence sum to exactly 64.
Q3 Compared with Snowflake, the big sortability cost of UUID v4 is that it…
Why: UUID v4 is pure randomness with no embedded timestamp, so sorting by the value tells you nothing about time — Snowflake puts time in the high bits.
Q4 A node needs more than 4,096 IDs in one millisecond. What happens?
Why: The 12-bit sequence holds 4,096 values; once exhausted the node busy-waits for the next millisecond and resets the sequence to zero.
Q5 Why prefer skipping IDs over risking duplicate IDs?
Why: A gap is a harmless missing number; a duplicate breaks the one invariant the system exists to provide. Always optimise for gaps over collisions.