Chapter 8 · Hinglish · Sunne-wala Sabak

Design a URL Shortener

Ek chhota aur santosh-bhara design jo Chapter saat ke unique id ideas dobara use karta hai, aur ek lambe link ko saat character mein badal deta hai. Niche "Play all" dabaiye aur browser pura chapter padh kar sunayega. Kisi bhi section par "Suniye" se wahin se shuru karein.

Companion slides kholें
Bhasha Hinglish (Roman) Sunne ka tareeka browser text-to-speech English version visual lesson →
🔊 Suniye Voice Speed

Tip: jo voice sabse natural lage wahi chuniye — ek Hindi (hi-IN) ya Indian-English (en-IN) voice aam taur par best chalti hai.

Kya bana rahe hain Slide 2

Chapter aath, design a URL shortener. Pura product do hi endpoints hai. Ek jo lambe URL ke liye ek short code banata hai, aur doosra jo visitor ko wapas us long URL par redirect karta hai. Boxes banane se pehle scope tay karo. Functional taur par sirf shorten aur redirect chahiye, plus optional custom alias, expiration, aur basic click analytics. Lekin asli design non-functional requirements mein hai. Redirect hot path hai, ise fast hona chahiye, hamesha up hona chahiye, aur read ke liye optimize hona chahiye. System read-heavy hai, lagbhag sau read har ek write par. Availability char nine honi chahiye, kyunki ek toota hua link har us jagah ko tod deta hai jahan share hua tha. Aur short code unique aur unpredictable hone chahiye.

Scale, back of envelope Slide 3

Ab scale ka andaza lagao. Aise number chuno jinhe tum defend kar sako, phir multiply karo. Maqsad order of magnitude hai, precision nahi. Maan lo teen crore daily active users hain, har ek roz teen naye link banata hai. To writes per day teen crore guna teen, yaani nau crore writes. Read to write ratio sau is to ek hai, to reads per day nau arab. Isse average read load lagbhag ek lakh char hazaar redirect per second banta hai, aur peak load lagbhag teen guna, yaani teen lakh redirect per second. Paanch saal ka storage lagbhag bayaasi terabyte aata hai, aur indexes aur replicas ke saath dhai sau terabyte ke aas paas. Matlab saaf hai, ek akela Postgres yeh nahi sambhaal sakta. Ek shard plus ek aggressive cache chahiye, aur baaki design yahi banata hai.

Public API Slide 4

Public API mein sirf do endpoints poora product chalaate hain. Pehla, POST slash shorten, jo ek long URL leta hai aur ek short URL lautata hai, sath mein optional alias aur expiry. Yeh success par do sau ek code deta hai, alias pehle se liya ho to char sau nau, aur target blocked ho to char sau baais. Doosra endpoint, GET slash short code, jo browser ko original target par redirect karta hai, aam taur par teen sau do code ke saath. Unknown code par char sau char, aur expire ho gaya ho to char sau das. Surface chhota rakho. Ek important baat, shorten idempotent hona chahiye, yaani wahi URL wahi user dobara bheje to wahi short code mile. Aur shorten endpoint par rate limit lagao, per IP aur per key. Redirect par rate limit nahi, kyunki wo sirf ek read hai.

Data model Slide 5

Data model mein ek hi main table lagbhag sab kuch karta hai. Hot path par sirf ek query maayne rakhti hai, short code se lookup, isliye us par primary key lagao. Table ke columns hain, short code jo primary key hai aur practice mein saat base baasath character ka hota hai. Phir long URL, jo lagbhag do kilobyte tak ka text hai aur jaise ka taisa store hota hai. Phir owner id, jo users table ka foreign key hai aur anonymous links ke liye null rehta hai. Phir expires at, jo null ho to link kabhi expire nahi hota. Aur is blocked, jo abuse scanner set karta hai aur redirect ko short circuit karta hai. Table ko short code ke hash se shard karo. Per click rows isme nahi jaati, wo ek alag columnar store mein jaati hain, kyunki yeh hot table chhota aur key value jaisa rehna chahiye.

Kitne character chahiye Slide 6

Ab sabse mazedaar sawaal, short code ko kitne character chahiye. Short code base baasath se banta hai, yaani digits zero se nine, lowercase a se z, aur uppercase A se Z, total baasath symbols. Har character lagbhag paune chhe bit rakhta hai. To n character ka matlab hai baasath ki power n itne codes. Maan lo paanch character, to lagbhag enaanve crore codes. Chhe character par lagbhag sattaavan arab. Saat character par lagbhag saade teen trillion codes, aur yahi hamari choice hai. Aath character par lagbhag do sau atthaarah trillion. Ek integer id ko base baasath string mein badalne ka tareeka simple hai, id ko baar baar baasath se divide karo aur jo remainders bachein unhe ulta padho. Saat character kaafi hai kyunki saal ke tetiis arab links par bhi yeh lagbhag ek sadi ka headroom deta hai. Kam mat khareedo, baad mein ek character badhana har cached link ke liye ek flag day ban jaata hai.

WRITE · har link par ek baar Client POST long URL Shortener short code banao links DB mapping store short URL lautao https://sho.rt/aB3xK9q READ · lagbhag sau guna zyada Client GET short URL Redirect svc code lookup cache code → long_url links DB cache miss par 301 / 302 redirect → long URL
Write mapping store karta hai aur short URL lautata hai; read code resolve karta hai (pehle cache, miss par DB) aur ek redirect deta hai. Read path hi optimize karne wala hai.

Approach A, hash Slide 7

Approach A, hash aur collision resolve karo. Long URL ko ek cryptographic hash se gujaaro, lagbhag bayaalis bit tak truncate karo, aur saat base baasath character mein encode kar do. Phir us code ko lookup karo. Agar slot khaali hai to insert kar do. Agar wahi URL pehle se hai to wahi code laut do. Lekin agar wo code kisi doosre URL par map karta hai, to yeh ek collision hai. Tab input mein ek salt joro aur dobara hash karo, jab tak ek free slot na mile, ek chhoti si retry limit ke andar. Iska faayda yeh hai ki workers stateless rehte hain, koi bhi server bina coordination ke shorten kar sakta hai, aur wahi URL dobara shorten karne par wahi code milta hai, yaani idempotency muft. Cost yeh hai ki har write se pehle ek read karna padta hai uniqueness check ke liye, aur jaise key space bharta hai, retry rate badhne lagta hai. Codes pseudo random dikhte hain, privacy ke liye achha, debug ke liye thoda awkward.

Approach B, counter aur base baasath Slide 8

Approach B, counter plus base baasath. Har naye link ko ek unique integer id do, phir use base baasath mein encode karo. Koi collision nahi, koi retry nahi, by construction. Har id alag hai, isliye har code alag hai. Id ek sharded counter se, ticket batches se jahan har worker ek hazaar id ek saath leta hai, ya Chapter saat wale Snowflake jaise generator se aa sakti hai. Lekin ek bada warning. Agar tum seedha sequential counter use karoge to codes adjacent aur enumerable ban jaate hain. Competitor tumhare paas wale links scrape karke tumhara traffic volume padh sakta hai, aur private links aasaani se discover ho jaate hain. Iska ilaaj, id ko ek fixed bijection se gujaaro, jaise ek secret ke saath XOR ya power of two ke modulo guna, base baasath encode karne se pehle. Tab output random dikhta hai par phir bhi collision free rehta hai. Cost coordination hai, id source ek contention point hai jise scale karna padta hai. Faayda simplicity hai, ek write per shorten, zero collision logic.

Caching, read path Slide 9

Caching, read path ko tej karne ke liye. Redirects ek power law follow karte hain, kuch viral links hi zyaadatar traffic uthate hain, isliye un rows ko user ke jitna paas ho sake utna paas rakho. Teen tiers load ko baari baari absorb karte hain. CDN edge par viral tail pakadta hai, ek hot short URL edge par hi answer ho jaata hai aur kabhi humare server tak nahi aata. Phir Redis app tier par, short code se hash partition kiya hua, code se long URL store karta hai, LRU eviction aur lagbhag chaubees ghante ka TTL ke saath, itna lamba ki spike jhel le aur itna chhota ki edits aur revocations pakad le. Aur negative caching bhi karo, char sau char ko ek do minute cache karo taaki random code probe karne wale scrapers DB ko na jalaayein. Jab koi link block ya edit ho, to Redis se delete karo aur CDN purge karo, TTL safety net hai agar koi delivery fail ho jaaye.

Teen sau ek versus teen sau do Slide 10

Ab sabse high signal tradeoff, teen sau ek versus teen sau do. Dono browser ko redirect karte hain. Farak yeh hai ki yaad kaun rakhta hai, aur isi se tay hota hai ki tum kabhi agla click dekh paaoge ya nahi. Teen sau ek matlab permanent. Browser mapping ko bahut lambe samay tak, aksar hamesha ke liye, cache kar leta hai, aur baad ke visits tumhare server ko skip karke seedha long URL par chale jaate hain. Isse server load kam hota hai par tum baaki clicks nahi dekh paate, analytics under report karti hai, aur link edit ya revoke karna takleef deh ho jaata hai kyunki purana cache mahinon tak puraane target ko hit karta rehta hai. Teen sau do matlab temporary. Browser har visit ko fresh maanta hai aur har click tumhari service ko hit karta hai. Isse server load zyada hota hai par tum har click dekhte ho, analytics accurate rehti hai, aur link instantly edit ya revoke ho jaata hai. Isi liye bit dot ly jaisi services teen sau do ko default rakhti hain. Textbook answer hai shorteners ke liye teen sau do, jab tak tumne load measure karke yeh tay na kar liya ho ki tum analytics khona pasand karoge.

Bad actors ko rokna Slide 11

Abuse rokna. Ek shortener open web ke liye ek redirection oracle hai, phishing kits, malware drops, aur scam funnels sab tumhare domain ke peeche chhupna chahte hain. Defence ek layered pipeline hai, kabhi ek single switch nahi. Pehla layer, create par scan. Shorten endpoint par target ko Safe Browsing, PhishTank, aur internal blocklists ke against check karo aur known bad ko foran reject karo. Doosra, async dobara scan, kyunki kal ka clean URL aaj ka compromised host ban sakta hai. Teesra, creators par rate limit, per IP, per key, per user, sirf shorten par, reads ko bina rate ke rakho taaki viral content ko saza na mile. Chautha, user reports aur interstitials, kai reports ek soft block trip karti hain, aur medium confidence risk par ek may be unsafe click through page dikhao redirect karne se pehle. Yaad rakho, koi ek layer sab kuch nahi pakad sakti.

Recap Slide 12

To paanch baatein yaad rakho. Ek, read path hi argument jeetta hai. Sau is to ek read par har microsecond redirect par poore cost ko dominate karta hai, isliye lookup pehle design karo. Do, hash versus counter ka sawaal coordination ka hai. Hashing stateless workers deta hai aur retries mein cost deta hai, counter guaranteed uniqueness deta hai aur ek contended id service mein cost deta hai. Teen, aggressively aur layers mein cache karo, CDN viral tail, Redis warm body, aur DB sirf cold misses dekhe. Char, teen sau do use karo jab tak tumne measure na kiya ho, kyunki teen sau ek ka load saving tab tak sirf ek anumaan hai. Aur paanch, short codes public hain, isliye obscurity par bharosa mat karo, itna randomize karo ki adjacent codes ek doosre ko reveal na karein.

Aage kya

Bas, yahi hai chapter aath. Ek kaam karo, soch kar dekho. Maan lo tumhe sau arab URLs address karne hain comfortable headroom ke saath. Tab kitne base baasath character chahiye, aur tum hashing chunoge ya counter? Apni choice defend karo, batao ki tum kaunsa failure mode operate karna pasand karoge, aur codes ko guessable hone se kaise rokoge. Agle chapter mein hum ek web crawler design karenge.