Chapter 7 · Hinglish · Sunne-wala Sabak
Distributed Unique ID Generator
Har distributed system ko aise unique id chahiye jo saikdon machines par bina ek single bottleneck ke ban sakein. Niche "Play all" dabaiye aur browser pura chapter padh kar sunayega. Kisi bhi section par "Suniye" se wahin se shuru karein.
▶ Companion slides kholेंTip: jo voice sabse natural lage wahi chuniye — ek Hindi (hi-IN) ya Indian-English (en-IN) voice aam taur par best chalti hai.
Humein kya chahiye Slide 2
Chapter saat, distributed unique id generator. Scheme chunne se pehle yeh tay karo ki achha id kaisa hota hai. Paanch properties har decision ko shape karti hain. Ek, id globally unique ho, do machines kabhi ek hi id na banaayein. Do, id roughly time order mein ho, yaani naya id purane ke baad sort ho. Teen, id compact ho, chausath bit kaafi hai. Chaar, high throughput ho, har node per second hazaron id bina central server ke round trip ke banaye. Aur paanch, no single point of failure, ek node mar jaaye to baaki minting karte rahein. Yahi paanchwi requirement aasaan answers ko maar deti hai.
Option one, auto-increment Slide 3
Pehla option, database auto-increment. Yeh har relational database ka default hai. Ek table, ek counter, ek baar mein ek row. Har insert agla integer leta hai aur transaction log uniqueness guarantee karta hai. Banana trivial hai aur id dense, sorted, aur chhote hote hain. Par problem yeh hai ki poori duniya ek hi counter row par serialize ho jaati hai, to write throughput ek single primary tak hi simit reh jaata hai. Database ko shard karoge to global counter toot jaata hai, aur coordinator add karoge to wahi bottleneck wapas aa jaata hai. Side project ke liye theek, lekin datacenters ke aar paar yeh fatal hai.
Option two, UUID Slide 4
Doosra option, UUID version four, pure randomness. Strategy bilkul ulat do, har client par ek sau atthaais random bit bana lo. Koi coordination nahi, koi central server nahi, koi network call nahi. Collision ki probability itni kam hai ki use zero maan lo. Iska bada faayda yeh hai ki koi single point of failure nahi, kyunki koi do generators baat hi nahi karte. Lekin cost bhi hai. UUID ek sau atthaais bit ka hota hai, yaani BIGINT se double storage. Aur randomness ki wajah se aap id se sort karke time order nahi paa sakte, inserts random index pages par girte hain aur unhe split karte rehte hain. UUID version saat is samasya ko ek timestamp prefix karke theek karta hai, jo asal mein Snowflake ka hi idea hai.
Option three, ticket server Slide 5
Teesra option, ticket server. Yeh Flickr ka classic tareeka hai, ek chhota sa dedicated service jiska kaam sirf agla integer dena hai. Har app server use network par id ke liye poochta hai. Id dense, monotonic, aur chhote aate hain, aur logic bas ek atomic counter hai, audit karna aasaan. Lekin wahi single counter single point of failure hai. Har id ke liye ek network round trip lagta hai, throughput ek machine tak simit hai, aur failover sach mein mushkil hai, kyunki hot standby ko kabhi wahi number nahi dena chahiye jo primary already de chuka hai, network partition ke baad bhi nahi. Ranges batch karke latency kam ki ja sakti hai par single point of failure rehta hai.
Option four, Snowflake Slide 6
Chautha option, Snowflake, aur yahi winner hai. Twitter ka insight yeh tha ki agar hum ek chausath bit integer ko pre-agreed fields mein kaat dein, to har machine apne local par id bana sakti hai aur phir bhi global uniqueness guarantee rehti hai. Koi network call nahi, koi central counter nahi. Do machines kabhi collide nahi karti kyunki unke machine id bits alag hote hain. Ek hi machine apne aap se collide nahi karti kyunki ya to millisecond aage badh gaya hai ya sequence counter. Aur kyunki timestamp high bits mein baithta hai, integer value se sort karna time se sort karna ban jaata hai, bilkul muft.
Chausath bit layout Slide 7
Ab ek Snowflake id ki layout shabdon mein. Yeh ek signed chausath bit integer hai, chaar fields mein bata hua. Sabse pehle ek sign bit, jo hamesha zero rehta hai taaki value positive rahe. Phir ektaalis bit timestamp, jo custom epoch se milliseconds rakhta hai aur lagbhag unhattar saal ka range deta hai. Phir das bit machine id, jo ek hazaar chaubees nodes ko pehchaanta hai, aur aam taur par paanch bit datacenter id aur paanch bit worker mein bata hota hai. Aur aakhir mein baarah bit sequence number, jo har machine par har millisecond char hazaar chhiyaanve id deta hai aur har naye millisecond mein zero par reset ho jaata hai. Ek node ka total throughput char hazaar chhiyaanve, gunaa ek hazaar millisecond, yaani char million id per second.
Clock skew hazard Slide 8
Do failure modes har Snowflake implementation ko pareshaan karte hain. Pehla, sequence overflow. Agar ek node ek hi millisecond mein char hazaar chhiyaanve se zyada id banana chahe, to baarah bit sequence khatam ho jaata hai. Tab generator agle millisecond ka intezaar karta hai, ek busy wait jo kuch sau nanoseconds ka hota hai, phir sequence zero se resume karta hai. Callers ko yeh dikhta bhi nahi. Doosra failure mode hai clock skew, yaani clock peeche jaana. NTP slew, leap second, ya VM migration wall clock ko peeche kheench sakte hain. Agar hum chote timestamp ko reuse kar lein to aise id banenge jo purane lagte hain aur duplicate ho sakte hain. Defensive strategy yeh hai ki last timestamp yaad rakho. Agar now last timestamp se chhota hai, to ya to clock ke pakadne ka intezaar karo, ya bade skew par generate karna mana karo aur alert bhejo. Yaad rakho, gaps sasta hai, duplicates catastrophic.
Comparison Slide 10
Ab chaaron options ko side by side rakho. Auto-increment, coordination ke liye database row lock chahiye, time se sortable hai, chausath bit ka hai, par single point of failure hai. UUID, koi coordination nahi, par time se sortable nahi, ek sau atthaais bit ka hai, aur koi single point of failure nahi. Ticket server, central RPC chahiye, time se sortable hai, chausath bit ka hai, par single point of failure hai. Aur Snowflake, sirf machine id ki coordination, time se sortable, chausath bit, aur koi single point of failure nahi. Ek beech ka raasta bhi hai, database segmented ranges, jo Instagram aur Flickr use karte hain. Database har node ko id ka ek block deta hai, aur node use locally serve karta hai. Rule of thumb yeh hai, agar thode hi nodes ek existing SQL stack par hain to segmented ranges sabse sasta jeet hai, aur agar saikdon producers hain aur zero hot path coordination chahiye to Snowflake bhejo.
Recap Slide 11
To chaar baatein yaad rakho. Ek, jitni kam ho sake utni hi coordination karo, uniqueness ko id ke structure mein daal do. Do, time ko high bits mein rakho, taaki id muft mein sortable ho jaaye. Teen, apni bit width ka budget banao, kitne saal, kitni machines, aur kitne id per millisecond chahiye, phir fields ko size karo. Aur chaar, clock ko adversary maan kar chalo, regression detect karo, minting rok do, aur loud alert karo. Sahi failure mode ke liye optimize karo, gaps over duplicates.
Aage kya
Bas, yahi hai chapter saat. Ek kaam karo, bit fields ko dobara budget karo. Maan lo aapko do sau saal ke timestamps aur ek hazaar chaubees machines millisecond resolution par chahiye. Tab timestamp ko kitne bit chahiye, sequence ke liye chausath bit mein kya bachta hai, aur kya aap second resolution par jaaoge ya machine field se ek bit churaoge. Yahi tradeoff sochiye. Agle chapter mein hum ek URL shortener design karenge.