After fifteen chapters of designing newsfeeds, chat systems, search engines, and storage layers, the same shapes keep returning. This chapter is the quiet step back — naming the recurring building blocks, the patterns that bind them, and the tradeoffs you cannot avoid.
Every system-design interview, every production sketch, ends up reaching for some subset of these. Knowing what each one is actually for — and when it is overkill — matters more than memorising any particular product.
Vertical scaling is easy until it is not. Horizontal scaling buys you elasticity, blast-radius limits, and rolling deploys — but only if no single request carries hidden affinity to a single box.
Browsers cache, CDNs cache, application servers cache, databases cache. Each layer cuts a zero off the latency. None of them help if you cannot answer one question: when does the cached value stop being true?
Browser → CDN → edge cache → app-level memory → distributed cache → DB buffer pool. Each layer absorbs a slice of traffic before the next.
Pick the strategy that matches the cost of a stale read. Write-through keeps things fresh; write-back wins on throughput but can lose data.
TTLs are the lazy default. Explicit deletes on write are sharper. Versioned keys (user:42:v17) sidestep invalidation entirely.
Thundering herd on expiry, hot keys overloading one node, stampede on cold start. Solve with jittered TTLs, request coalescing, and warmups.
Synchronous calls force the slowest dependency to dictate the user-facing latency, and they propagate failure outward. Queues turn coupled chains into independent stages — each one can scale, retry, and fail without taking the rest down.
They are not the same lever. Replicas keep you serving when a node dies and let reads scale. Shards let writes and storage exceed what any one machine can hold. Most large systems do both — and most of the pain comes from confusing one for the other.
Why: survive a node loss; offload reads.
Cost: replication lag; split-brain risk on failover; same dataset everywhere — does not grow your capacity.
Knob: sync vs async replicas; how many you can lose before quorum fails.
Why: dataset or write rate too large for one machine.
Cost: cross-shard joins; rebalancing is painful; choosing a key you cannot change later.
Knob: shard key (user, tenant, geo, hash); rebalance strategy.
Shard for capacity, then replicate each shard for availability. A user lookup hashes to shard 7, which has a primary plus two replicas. Lose any one node, you are still up.
A normalized schema is beautiful and slow. Once reads outnumber writes by orders of magnitude — feeds, search, profile pages — you copy the data into shapes that match each query. The price is duplication, and the discipline of keeping copies eventually consistent.
The original theorem is starker than reality — partitions are rare, brief, and rarely total. The useful framing: you are designing every distributed call to either pause until it can be consistent, or answer with possibly stale data and reconcile later. Different products want different answers.
Banking, ticket inventory, distributed locks. Better to be unavailable for ten seconds than to double-spend or oversell.
Social feeds, product catalogues, DNS. A slightly stale "last seen" is fine; an error page is not.
A single-machine database, or a tightly coupled rack with no network in the middle. The moment you span data centres, you choose C or A.
In real systems, the choice is per-operation. The same database can serve read-your-writes for the user's own profile (CP-leaning) and last-writer-wins for a notification count (AP-leaning).
No system is good at all four at once. The job of a designer is to name where each knob sits today, what moving it costs, and which neighbour it drags along.
Batching, buffering, and pipelining raise throughput by holding work until it adds up. They also raise the time any single request waits. Pick where you sit and tell users the truth.
Five-nines costs roughly an order of magnitude more than three-nines. Decide which user journeys actually need it — checkout probably, vanity counter probably not.
Reading chapters is the easy part. The skill compounds through three habits — and none of them happen by accident.
Run a thing. Break it. Fix it.
Production code teaches what blog posts cannot.
Writing forces the vague to become concrete.
Most of system design is not invention. It is selection — naming the few shapes that match the problem, picking honestly, and accepting the cost.
The interesting question is always "what fails next?" Every component has a breaking point. Designing well means knowing which one will go first and what happens when it does.
Pick the simplest thing that survives your next 10× of growth. Not your next 1000×. Premature scale carries its own failure modes; defer them until the numbers force your hand.
State is the enemy of horizontal scale. Push it to the edges — caches, queues, databases — and keep the middle layer disposable.
Eventual consistency is fine if you can name when it matters. Identify the few flows where stale data hurts users, and design those flows differently from the rest.
Every choice has a bill. Pay it on purpose. Latency, cost, complexity, operational burden — the goal is not to avoid the bill, but to know which one you are signing for.