Chapter 2 · System Design Fundamentals
Back-of-the-Envelope Estimation
Order-of-magnitude reasoning for capacity, latency, storage, and bandwidth — the napkin math that tells you in seconds whether a design is sane, and where the real bottleneck will live.
▶ Open the companion slidesAn estimate is not a forecast — it's a feasibility check. It tells you whether you need one box or a thousand, which dimension dominates cost, and whether the design quietly assumes something impossible. The whole skill is one move repeated: decompose a vague problem into a chain of small, defensible multipliers, then round hard.
Alex Xu, System Design Interview (Vol. 1), Chapter 2; the latency ladder traces back to Jeff Dean's "Numbers Everyone Should Know." Companion deck: slides — jump to any slide with the Slide N chips.
Why estimate at all Slide 2
The same ten-minute estimate pays off in four places — and in an interview, the number matters far less than whether you can get to it without panic.
In interviews
Show you can break a vague prompt into users, requests, payload, and replication — and reach a defensible total out loud.
In production
Prevents both costly over-provisioning and embarrassing under-provisioning, and surfaces which dimension dominates the bill.
In planning
If a rough sketch already needs 80 PB of RAM, the design is wrong. Better to learn that in a meeting than three sprints in.
In conversation
Numbers anchor arguments. "It feels slow" becomes "p99 is 240 ms but the budget was 80 ms" — now the team can act.
Powers of two Slide 3
Memorise these once and storage math becomes mental arithmetic. For envelope work, treat each binary step as its nearest power of ten — the error after multiplying three of them is only a few percent.
1 KB (decimal) = 1,000 bytes; 1 KiB (binary) = 1,024 — about 2.4% more, drifting to ~10% by terabytes. For napkin math the two are interchangeable; for disks and billing, they are not.
The latency ladder Slide 4
Orders of magnitude worth memorising. Use them to spot designs that assume the impossible — like "synchronously fan out to 50 services on the hot path."
| Operation | Approx time | Relative scale |
|---|---|---|
| L1 cache reference | ~0.5 ns | |
| Main memory (RAM) read | ~100 ns | |
| SSD random read (NVMe) | ~100 µs | |
| Round-trip inside a datacenter | ~500 µs | |
| Spinning-disk (HDD) seek | ~10 ms | |
| Cross-region RTT (EU ↔ US-East) | ~80 ms | |
| Cross-continent RTT (EU ↔ APAC) | ~200 ms |
RAM is ~100× faster than SSD; SSD is ~100× faster than spinning disk; same-DC network sits between RAM and SSD; cross-region adds tens of milliseconds you cannot optimise away. Light in fibre travels at ~⅔ c — that's the floor for every cross-continent hop.
The nines, and what they cost Slide 5
Each extra nine is roughly 10× harder. Past three nines, most outages come not from hardware but from deploys, config changes, and human error — so the investment shifts from redundancy to process.
| Availability | Annual downtime | Per month | Per day |
|---|---|---|---|
| 99% two nines | ~3.65 days | ~7.2 h | ~14.4 min |
| 99.9% three nines | ~8.77 h | ~43.8 min | ~86 s |
| 99.95% | ~4.38 h | ~21.9 min | ~43 s |
| 99.99% four nines | ~52.6 min | ~4.3 min | ~8.6 s |
| 99.999% five nines | ~5.26 min | ~26 s | ~0.86 s |
A request crossing three services each at 99.9% has a ceiling of ~99.7%. Every synchronous dependency taxes your headline number — caches, retries, and graceful degradation are how you claw it back. And 99.95% leaves only ~22 min/month: one botched deploy can burn half your error budget.
QPS — from users to load Slide 6
Start from the user count, layer on behaviour, then peakiness. Average QPS sizes the steady-state fleet; peak QPS sizes the headroom for worst-Tuesday-at-9pm.
Peak-to-average: 2×–5×
Most consumer products peak at 2–5× the daily average. A chat app peaks gently; a livestream catastrophically.
Split the verbs
Estimate reads and writes separately. Ratios of 10:1 or 100:1 imply very different cache and replication strategies.
Plan for 2× growth
Size for traffic 12–18 months out. Slight over-provisioning is cheap; an emergency re-architecture is not.
Carry both numbers
Average and peak answer different questions and cost very different amounts. Thread both all the way to the capacity answer.
Storage — four multipliers Slide 7
Skip any one of these and the answer is off by an order of magnitude.
Item size: median + tail
A "post" may average 400 B of text but carry a 2 MB image. Decompose: text, metadata, attachments, indexes. Add 20–30% for serialization.
Replication: ×3 (or more)
Most stores keep 3 replicas in-region; cross-region DR pushes it to 4–6×. Erasure coding claws some back at CPU/latency cost.
Growth compounds
If users and per-user content both grow, storage grows multiplicatively. Three years of 1.5× is 3.4×, not 4.5×.
The hidden tax
Indexes add 30–80%; logs often exceed the data they describe; backups multiply the whole figure again.
Bandwidth — QPS × payload Slide 8
Throughput is the easy part. The catch: ingress and egress are rarely symmetric, and cloud egress is what shows up on the bill.
ingress_Bps = write_QPS × avg_request_size
A "10 Gbps" NIC delivers ~1.25 GB/s of payload — and less after protocol overhead and encryption. Plan for ~60–70% of nominal at peak. Confusing bits and bytes is an 8× error, the single most common one.
Egress dominates reads
A video service ingests ~1 MB/upload but streams ~500 MB/view — 500× asymmetry. Messaging is near-symmetric; search is read-heavy with tiny responses.
Where the bill hides
Within an AZ, bandwidth is ~free; across AZs, cents/GB; across regions or to the internet, many times more. Estimate egress by destination.
Worked example: a social feed Slide 9
Microblog Quill — size storage, read QPS, and egress for the home timeline.
Worked example: photo sharing Slide 10
Photo app Pebble — photos dominate storage and egress, the opposite shape from Quill.
Where envelope math goes wrong Slide 11
- Bits vs bytes. 10 Gbps is 1.25 GB/s, not 10. An 8× error — the worst single one.
- Sizing for average, not peak. Barely-enough at average means on-fire at peak.
- Forgetting replication and indexes. 3 replicas + 2 indexes + a backup easily multiply raw data by 8×.
- Forgetting growth. Today's storage is not the target — size for 18–24 months out.
- False precision. "4,728,193 QPS" is less credible than "~5 M", because it hides assumptions.
- Ignoring the long tail. A few celebrity/viral items can outweigh millions of typical ones — hot keys get their own line.
Rules that keep you honest Slide 12
Round liberally
One significant figure is plenty. "~50k QPS" is the right precision for napkin work.
Prefer ranges
"Between 8 and 12 PB" admits uncertainty honestly and gives reviewers something to push on.
Sanity-check end-to-end
If one machine "handles 10 M QPS" or a region needs 50 PB of RAM — stop, and compare to something real.
Show the inputs
The estimate's value is the trail of assumptions. Anyone should be able to change one input and re-run in 30 seconds.
The goal isn't the right answer to four decimals. It's the right answer within a factor of two, fast enough to ask "what if?" five times before lunch.
Active recall
Cover the answers. Say each number out loud before you tap to check.