From a single auto-increment column to 64-bit Snowflake IDs minted independently on hundreds of machines — without ever colliding.
Before picking a scheme, pin down what "good" looks like. Five properties shape every design decision that follows.
The default in every relational database. One table, one counter, one row at a time. It works until it doesn't.
The database keeps an internal counter. Every INSERT grabs the next integer and hands it back. Uniqueness is guaranteed by the engine's transaction log.
Generate 128 random bits on each client. No coordination, no central server, no network call. The collision probability is so small it might as well be zero.
Flickr's classic move: a tiny dedicated service whose only job is to hand out the next integer. Every app server asks it for an ID over the network.
Twitter's insight: if we slice a 64-bit integer into pre-agreed fields, every machine can mint IDs locally and still guarantee global uniqueness. No network call, no central counter.
Each machine knows its own ID and tracks its own per-millisecond sequence. Two machines can never collide because their machine-ID bits differ. The same machine can't collide with itself because either the millisecond has advanced or the sequence counter has.
Sorting by the integer value sorts by time (because the timestamp sits in the high bits) — exactly what B-tree indexes want.
A scaled-up diagram of one Snowflake ID. The width of each block is exactly proportional to the number of bits it occupies.
Two failure modes haunt every Snowflake implementation: a node trying to mint more than 4,096 IDs in one millisecond, and a system clock that jumps backwards.
If the sequence counter reaches its ceiling (4,095) within the same millisecond, the generator must wait for the next millisecond before producing more IDs. This is a busy-wait — usually a few hundred nanoseconds — and is invisible to callers.
NTP slews, leap seconds, and VM migrations can pull the system clock backwards. If we naively used the new (smaller) timestamp, we'd mint IDs that look older than ones already issued — and possibly duplicate them.
Defensive strategy: remember the last timestamp used. If now < lastTimestamp, either wait until the clock catches up (small skews) or refuse to generate IDs and alert (large skews). Some implementations also use the high bits of the machine ID as a logical-clock guard so that a restart with a regressed clock can be detected.
A pragmatic middle ground used at Instagram, Flickr, and inside many internal services. The database hands out blocks of N IDs to each node, and the node serves them locally.
Same five approaches, scored on the requirements we wrote down on slide two.
Whichever scheme you ship, these ideas survive the choice.