Chapter Sixteen · Synthesis

Patterns &
Lessons.

After fifteen chapters of designing newsfeeds, chat systems, search engines, and storage layers, the same shapes keep returning. This chapter is the quiet step back — naming the recurring building blocks, the patterns that bind them, and the tradeoffs you cannot avoid.

A closing chapter 11 slides arrow keys · space · F for fullscreen
02 · The Toolkit

Six pieces show up in almost every design.

Every system-design interview, every production sketch, ends up reaching for some subset of these. Knowing what each one is actually for — and when it is overkill — matters more than memorising any particular product.

02
Load Balancer
Show up the moment one server is not enough. Spreads requests, hides failures behind a stable endpoint, terminates TLS.
Cache
When the same answer is computed over and over. Reads dominate writes; hot keys appear. Saves CPU, DB load, and tail latency.
Queue
When work can wait, or when one slow consumer should not crash the whole producer. Smooths spikes, decouples failure domains.
Database
The source of truth. Relational for transactions and joins; key-value or document for scale and flexible shape. Pick by access pattern, not vibes.
CDN
When users are global and bytes are big or repetitive. Static assets, video, images, even cached API responses served close to the user.
Blob Store
Big, opaque, write-once objects: photos, videos, backups, model checkpoints. Cheap, durable, slow per-request. Put a CDN in front.
03 · Pattern

Scale outward, and keep the servers stateless.

Vertical scaling is easy until it is not. Horizontal scaling buys you elasticity, blast-radius limits, and rolling deploys — but only if no single request carries hidden affinity to a single box.

03
  • Push session state out. Cookies, signed tokens, or a shared cache. Any server can serve any request.
  • Make handlers idempotent. A retry from the load balancer should never charge a card twice or duplicate a message.
  • Treat instances as cattle. Auto-scale on real signals — queue depth, p99 latency — not just CPU.
  • Watch the shared dependencies. If every stateless box hammers the same database, you have only moved the bottleneck one hop.
clients LB stateless app · A app · B app · C shared state any client → any app · session lives outside
04 · Pattern

Cache everywhere. The hard part is knowing when to throw it away.

Browsers cache, CDNs cache, application servers cache, databases cache. Each layer cuts a zero off the latency. None of them help if you cannot answer one question: when does the cached value stop being true?

04

Layered, not single

Browser → CDN → edge cache → app-level memory → distributed cache → DB buffer pool. Each layer absorbs a slice of traffic before the next.

Read-through, write-around, write-back

Pick the strategy that matches the cost of a stale read. Write-through keeps things fresh; write-back wins on throughput but can lose data.

Invalidation tools

TTLs are the lazy default. Explicit deletes on write are sharper. Versioned keys (user:42:v17) sidestep invalidation entirely.

The pathologies

Thundering herd on expiry, hot keys overloading one node, stampede on cold start. Solve with jittered TTLs, request coalescing, and warmups.

latency budget per layer (typical) Browser cache 0 ms CDN edge ~10 ms In-process cache ~0.1 ms Distributed cache (Redis) ~1 ms Database ~5–50 ms Disk / cold storage 100 ms+
05 · Pattern

When work can wait, put it in a queue.

Synchronous calls force the slowest dependency to dictate the user-facing latency, and they propagate failure outward. Queues turn coupled chains into independent stages — each one can scale, retry, and fail without taking the rest down.

05
  • Sync is for "I cannot answer without it." Logging in, checking out, fetching a feed. Everything else can be queued.
  • Producers and consumers scale independently. A traffic spike fills the queue; the consumer pool drains it on its own timeline.
  • Retries become safe. With idempotent consumers and dead-letter queues, a transient failure does not become a lost event.
  • The cost is observability. You now need queue depth, age of oldest message, and consumer lag in your dashboards — not just request latency.
API producer queue · durable, ordered worker · 1 worker · 2 worker · N returns 202 fast scale workers ⊥ producers dead-letter queue
06 · Pattern

Replicate for availability. Shard for capacity.

They are not the same lever. Replicas keep you serving when a node dies and let reads scale. Shards let writes and storage exceed what any one machine can hold. Most large systems do both — and most of the pain comes from confusing one for the other.

06

Replication

Why: survive a node loss; offload reads.
Cost: replication lag; split-brain risk on failover; same dataset everywhere — does not grow your capacity.
Knob: sync vs async replicas; how many you can lose before quorum fails.

Sharding

Why: dataset or write rate too large for one machine.
Cost: cross-shard joins; rebalancing is painful; choosing a key you cannot change later.
Knob: shard key (user, tenant, geo, hash); rebalance strategy.

The combination most large systems land on

Shard for capacity, then replicate each shard for availability. A user lookup hashes to shard 7, which has a primary plus two replicas. Lose any one node, you are still up.

3 shards × 3 replicas SHARD A primary users [0–33%] replica replica SHARD B primary users [34–66%] replica replica SHARD C primary users [67–100%] replica replica capacity grows left→right · availability grows top→bottom
07 · Pattern

Denormalize for reads. Accept that writers will catch up later.

A normalized schema is beautiful and slow. Once reads outnumber writes by orders of magnitude — feeds, search, profile pages — you copy the data into shapes that match each query. The price is duplication, and the discipline of keeping copies eventually consistent.

07
  • Materialise the view. Precompute the feed, the leaderboard, the search index. Reads become a single key lookup.
  • Fan-out on write. When a creator posts, push the item into every follower's inbox. Costly write, trivial read.
  • Fan-out on read. For celebrities with millions of followers, flip it: compute the feed at read time and cache it briefly.
  • Accept eventual consistency. A "like" count may lag by seconds. Almost no user cares. Phrase your product around that truth.
  • Have a rebuild path. When (not if) a copy drifts, you need a job that recomputes it from the source of truth.
posts source of truth normalized write fans out feed view per-follower inbox search index inverted, by token counters cache likes, views, replies each read hits exactly one shape
08 · Tradeoff

CAP, in practice: when the network splits, pick a side.

The original theorem is starker than reality — partitions are rare, brief, and rarely total. The useful framing: you are designing every distributed call to either pause until it can be consistent, or answer with possibly stale data and reconcile later. Different products want different answers.

08
CP — pause writes when partitioned

Banking, ticket inventory, distributed locks. Better to be unavailable for ten seconds than to double-spend or oversell.

AP — answer with what you have

Social feeds, product catalogues, DNS. A slightly stale "last seen" is fine; an error page is not.

CA — only when partitions don't happen

A single-machine database, or a tightly coupled rack with no network in the middle. The moment you span data centres, you choose C or A.

In real systems, the choice is per-operation. The same database can serve read-your-writes for the user's own profile (CP-leaning) and last-writer-wins for a notification count (AP-leaning).

C consistency A availability P partition tol. CP give up A AP give up C CA · single node only ·spanner ·etcd · zookeeper cassandra· dynamo · couch·
09 · Tradeoff

Latency vs throughput. Cost vs reliability.

No system is good at all four at once. The job of a designer is to name where each knob sits today, what moving it costs, and which neighbour it drags along.

09
Latency ↔ Throughput
low latency
per request
high throughput
aggregate

Batching, buffering, and pipelining raise throughput by holding work until it adds up. They also raise the time any single request waits. Pick where you sit and tell users the truth.

Cost ↔ Reliability
cheap
(1 region, no replicas)
resilient
(multi-region, quorum)

Five-nines costs roughly an order of magnitude more than three-nines. Decide which user journeys actually need it — checkout probably, vanity counter probably not.

Decision tree: do I really need X? a slow query same answer often? data too big? add a cache shard or index still slow? precompute view writes hot? queue + workers measure before next move
10 · Practice

How to actually keep getting better at this.

Reading chapters is the easy part. The skill compounds through three habits — and none of them happen by accident.

10

1. Build the smallest version

Run a thing. Break it. Fix it.

  • Stand up a multi-node Redis and watch what happens when one node dies mid-write.
  • Wire Kafka between two services and intentionally slow the consumer to see lag pile up.
  • Shard a Postgres database by user id and write the rebalance script before you need it.
  • Run a load test against your own toy API and find the first thing that snaps.

2. Read other people's source

Production code teaches what blog posts cannot.

  • Pick one open-source system you depend on — Postgres, Kafka, Envoy, etcd — and read its design docs end to end.
  • Trace a single request through the codebase. Note every queue, lock, and retry.
  • Subscribe to a couple of engineering blogs that publish real incident write-ups, not marketing.

3. Design on paper, weekly

Writing forces the vague to become concrete.

  • Pick a product you use. Sketch how you'd build the part you understand least.
  • Write the design as if a stranger will implement it next week — include numbers, not adjectives.
  • Re-read it a month later. What was wrong? What did you not know to ask?
  • Keep a running list of "things I do not yet understand." It is your real curriculum.
11 · In closing

Five principles to leave with.

Most of system design is not invention. It is selection — naming the few shapes that match the problem, picking honestly, and accepting the cost.

11
01

The interesting question is always "what fails next?" Every component has a breaking point. Designing well means knowing which one will go first and what happens when it does.

02

Pick the simplest thing that survives your next 10× of growth. Not your next 1000×. Premature scale carries its own failure modes; defer them until the numbers force your hand.

03

State is the enemy of horizontal scale. Push it to the edges — caches, queues, databases — and keep the middle layer disposable.

04

Eventual consistency is fine if you can name when it matters. Identify the few flows where stale data hurts users, and design those flows differently from the rest.

05

Every choice has a bill. Pay it on purpose. Latency, cost, complexity, operational burden — the goal is not to avoid the bill, but to know which one you are signing for.

← → arrow keys · space · F fullscreen · ? help