Chapter 2 · System Design Fundamentals

Back-of-the-Envelope Estimation

Order-of-magnitude reasoning for capacity, latency, storage, and bandwidth — the napkin math that tells you in seconds whether a design is sane, and where the real bottleneck will live.

▶ Open the companion slides

Reading time ~13 min Prerequisites Chapter 1 Audio 🔊 Hinglish read-aloud Next Design a Rate Limiter

An estimate is not a forecast — it's a feasibility check. It tells you whether you need one box or a thousand, which dimension dominates cost, and whether the design quietly assumes something impossible. The whole skill is one move repeated: decompose a vague problem into a chain of small, defensible multipliers, then round hard.

Primary source

Alex Xu, System Design Interview (Vol. 1), Chapter 2; the latency ladder traces back to Jeff Dean's "Numbers Everyone Should Know." Companion deck: slides — jump to any slide with the Slide N chips.

Why estimate at all Slide 2

The same ten-minute estimate pays off in four places — and in an interview, the number matters far less than whether you can get to it without panic.

In interviews

Show you can break a vague prompt into users, requests, payload, and replication — and reach a defensible total out loud.

In production

Prevents both costly over-provisioning and embarrassing under-provisioning, and surfaces which dimension dominates the bill.

In planning

If a rough sketch already needs 80 PB of RAM, the design is wrong. Better to learn that in a meeting than three sprints in.

In conversation

Numbers anchor arguments. "It feels slow" becomes "p99 is 240 ms but the budget was 80 ms" — now the team can act.

Powers of two Slide 3

Memorise these once and storage math becomes mental arithmetic. For envelope work, treat each binary step as its nearest power of ten — the error after multiplying three of them is only a few percent.

2^10

1,024

Kilo · KiB ≈ 10³

2^20

~1.05 M

Mega · MiB ≈ 10⁶

2^30

~1.07 B

Giga · GiB ≈ 10⁹

2^40

~1.10 T

Tera · TiB ≈ 10¹²

2^50

~1.13 Q

Peta · PiB ≈ 10¹⁵

Binary vs decimal

1 KB (decimal) = 1,000 bytes; 1 KiB (binary) = 1,024 — about 2.4% more, drifting to ~10% by terabytes. For napkin math the two are interchangeable; for disks and billing, they are not.

The latency ladder Slide 4

Orders of magnitude worth memorising. Use them to spot designs that assume the impossible — like "synchronously fan out to 50 services on the hot path."

Operation	Approx time	Relative scale
L1 cache reference	~0.5 ns
Main memory (RAM) read	~100 ns
SSD random read (NVMe)	~100 µs
Round-trip inside a datacenter	~500 µs
Spinning-disk (HDD) seek	~10 ms
Cross-region RTT (EU ↔ US-East)	~80 ms
Cross-continent RTT (EU ↔ APAC)	~200 ms

The reflex to keep

RAM is ~100× faster than SSD; SSD is ~100× faster than spinning disk; same-DC network sits between RAM and SSD; cross-region adds tens of milliseconds you cannot optimise away. Light in fibre travels at ~⅔ c — that's the floor for every cross-continent hop.

The nines, and what they cost Slide 5

Each extra nine is roughly 10× harder. Past three nines, most outages come not from hardware but from deploys, config changes, and human error — so the investment shifts from redundancy to process.

Availability	Annual downtime	Per month	Per day
99% two nines	~3.65 days	~7.2 h	~14.4 min
99.9% three nines	~8.77 h	~43.8 min	~86 s
99.95%	~4.38 h	~21.9 min	~43 s
99.99% four nines	~52.6 min	~4.3 min	~8.6 s
99.999% five nines	~5.26 min	~26 s	~0.86 s

Watch out · nines multiply downward

A request crossing three services each at 99.9% has a ceiling of ~99.7%. Every synchronous dependency taxes your headline number — caches, retries, and graceful degradation are how you claw it back. And 99.95% leaves only ~22 min/month: one botched deploy can burn half your error budget.

QPS — from users to load Slide 6

Start from the user count, layer on behaviour, then peakiness. Average QPS sizes the steady-state fleet; peak QPS sizes the headroom for worst-Tuesday-at-9pm.

Behaviour and peakiness are the two multipliers people forget. A day is 86,400 s ≈ 10⁵ — worth memorising.

Peak-to-average: 2×–5×

Most consumer products peak at 2–5× the daily average. A chat app peaks gently; a livestream catastrophically.

Split the verbs

Estimate reads and writes separately. Ratios of 10:1 or 100:1 imply very different cache and replication strategies.

Plan for 2× growth

Size for traffic 12–18 months out. Slight over-provisioning is cheap; an emergency re-architecture is not.

Carry both numbers

Average and peak answer different questions and cost very different amounts. Thread both all the way to the capacity answer.

Storage — four multipliers Slide 7

Skip any one of these and the answer is off by an order of magnitude.

storage/year = items/day × avg_size × 365 × replicas × retention_years

Item size: median + tail

A "post" may average 400 B of text but carry a 2 MB image. Decompose: text, metadata, attachments, indexes. Add 20–30% for serialization.

Replication: ×3 (or more)

Most stores keep 3 replicas in-region; cross-region DR pushes it to 4–6×. Erasure coding claws some back at CPU/latency cost.

Growth compounds

If users and per-user content both grow, storage grows multiplicatively. Three years of 1.5× is 3.4×, not 4.5×.

The hidden tax

Indexes add 30–80%; logs often exceed the data they describe; backups multiply the whole figure again.

Bandwidth — QPS × payload Slide 8

Throughput is the easy part. The catch: ingress and egress are rarely symmetric, and cloud egress is what shows up on the bill.

egress_Bps = read_QPS × avg_response_size
ingress_Bps = write_QPS × avg_request_size

Watch out · bits vs bytes

A "10 Gbps" NIC delivers ~1.25 GB/s of payload — and less after protocol overhead and encryption. Plan for ~60–70% of nominal at peak. Confusing bits and bytes is an 8× error, the single most common one.

Egress dominates reads

A video service ingests ~1 MB/upload but streams ~500 MB/view — 500× asymmetry. Messaging is near-symmetric; search is read-heavy with tiny responses.

Where the bill hides

Within an AZ, bandwidth is ~free; across AZs, cents/GB; across regions or to the internet, many times more. Estimate egress by destination.

Worked example: a social feed Slide 9

Microblog Quill — size storage, read QPS, and egress for the home timeline.

1 · Inputs

DAU = 40 M · posts/user/day = 0.5 · reads/user/day = 30 · post = 350 B text + 80 B meta

2 · Writes/s

40e6 × 0.5 / 86,400 ≈ 230/s avg · peak ×3 ≈ 700/s

3 · Reads/s

40e6 × 30 / 86,400 ≈ 13.9k/s avg · peak ×3 ≈ ~42k/s

4 · Raw storage

20M posts/day × 430 B ≈ 8.6 GB/day × 365 ≈ 3.1 TB/yr

5 · With ×3 replicas + indexes

3.1 TB × 3 × 1.5 ≈ ~14 TB/yr durable

6 · Peak read egress

42k/s × 12 KB/page ≈ ~500 MB/s ≈ 4 Gbps (before images)

Conclusion

Storage is small (~14 TB/yr). Read fan-out, not storage, is the design driver — needs a hot-set cache for celebrity authors.

Worked example: photo sharing Slide 10

Photo app Pebble — photos dominate storage and egress, the opposite shape from Quill.

1 · Inputs

DAU = 8 M · uploads/user/day = 0.4 · views/user/day = 60 · photo + thumbs ≈ 2.4 MB

2 · Daily new bytes

3.2M uploads/day × 2.4 MB ≈ ~7.7 TB/day

3 · Annual storage, replicated

7.7 TB × 365 × 3 ≈ ~8.4 PB/yr (+ ~10% metadata)

4 · Views/s

8e6 × 60 / 86,400 ≈ 5.6k/s avg · peak ×4 ≈ ~22k/s

5 · Peak egress

22k/s × ~310 KB/view ≈ ~6.8 GB/s ≈ 55 Gbps

Conclusion

Petabyte-scale object store + tiering; egress demands a CDN with a high hit ratio. The upload path is tiny — a few dozen workers.

Where envelope math goes wrong Slide 11

Bits vs bytes. 10 Gbps is 1.25 GB/s, not 10. An 8× error — the worst single one.
Sizing for average, not peak. Barely-enough at average means on-fire at peak.
Forgetting replication and indexes. 3 replicas + 2 indexes + a backup easily multiply raw data by 8×.
Forgetting growth. Today's storage is not the target — size for 18–24 months out.
False precision. "4,728,193 QPS" is less credible than "~5 M", because it hides assumptions.
Ignoring the long tail. A few celebrity/viral items can outweigh millions of typical ones — hot keys get their own line.

Rules that keep you honest Slide 12

Round liberally

One significant figure is plenty. "~50k QPS" is the right precision for napkin work.

Prefer ranges

"Between 8 and 12 PB" admits uncertainty honestly and gives reviewers something to push on.

Sanity-check end-to-end

If one machine "handles 10 M QPS" or a region needs 50 PB of RAM — stop, and compare to something real.

Show the inputs

The estimate's value is the trail of assumptions. Anyone should be able to change one input and re-run in 30 seconds.

North star

The goal isn't the right answer to four decimals. It's the right answer within a factor of two, fast enough to ask "what if?" five times before lunch.

Active recall

Cover the answers. Say each number out loud before you tap to check.

For mental math, 2¹⁰, 2²⁰, 2³⁰ ≈ which powers of ten?

Round each binary step to a decimal one.

2¹⁰ ≈ 10³ (thousand), 2²⁰ ≈ 10⁶ (million), 2³⁰ ≈ 10⁹ (billion). Multiplying three of these drifts only a few percent — fine for envelope work.

Roughly how much faster is RAM than SSD, and SSD than HDD?

Same ratio twice.

~100× each. RAM ~100 ns, SSD ~100 µs, HDD seek ~10 ms. Same-DC network (~500 µs) sits between RAM and SSD.

How much annual downtime does 99.9% allow?

"Three nines."

~8.8 hours/year (~43.8 min/month). Each extra nine is ~10× less downtime and ~10× harder to reach.

Seconds in a day, and why it matters?

It's the QPS denominator.

86,400 ≈ 10⁵. You divide (users × actions/day) by it to get average QPS, then multiply by a 2–5× peak factor.

A "10 Gbps" link delivers how many bytes per second?

Bits, not bytes.

~1.25 GB/s (÷8), and less after overhead — plan ~60–70% usable. Confusing bits and bytes is an 8× error.

The storage formula — name all five multipliers.

In order.

items/day × avg_size × 365 × replicas × retention_years. Forgetting replication, indexes, or growth is the usual reason estimates are 5–10× low.

Check yourself

Q1 A NIC is rated "10 Gbps." Roughly how much payload per second is that?

Why: Divide bits by 8: 10/8 = 1.25 GB/s, and less after overhead. Bits-vs-bytes is the classic 8× mistake.

Q2 Compared with the previous one, each additional nine of availability is roughly…

Why: Each nine cuts allowed downtime ~10×. Past three nines, the limiting factor becomes deploys and human error, not hardware.

Q3 How many seconds are in a day (the QPS denominator)?

Why: 60 × 60 × 24 = 86,400 ≈ 10⁵. Divide daily actions by it for average QPS.

Q4 To turn raw daily data into a yearly storage figure, you must also multiply by…

Why: Replication (×3+), indexes (+30–80%), and compounding growth are usually 5–10× the raw figure — the most common omission.

Q5 In the Quill social-feed example, what turned out to be the design driver?

Why: Storage was tiny (~14 TB/yr) but peak reads hit ~42k/s — read fan-out (and hot-author caching) dominates the design.