Order-of-magnitude reasoning for capacity, latency, storage and bandwidth — the napkin math every systems engineer needs at their fingertips.
A back-of-the-envelope estimate is not a forecast — it is a feasibility check. It tells you in seconds whether a design is roughly sane, whether you need one box or one thousand, and where the real bottleneck is likely to live.
Interviewers care less about a precise answer than about whether you can decompose a vague problem into numbers — users, requests, payload, replication — and reach a defensible total without panic.
Before writing a design doc, a 10-minute estimate prevents both costly over-provisioning and embarrassing under-provisioning. It also surfaces the dimension that will dominate cost: storage, egress, or compute.
If a rough sketch already needs 80 petabytes of RAM or 4 Tbps of egress, the design is wrong — not the estimate. Better to learn that in a meeting than three sprints in.
Numbers anchor arguments. "It feels slow" becomes "p99 is 240 ms but the budget was 80 ms" — and now the team can act on it.
Memorise these once and storage math becomes mental arithmetic. Binary prefixes (KiB, MiB) climb in 1024; decimal prefixes (KB, MB) climb in 1000 — the gap widens as the numbers grow.
1 KB (decimal) = 1,000 bytes. 1 KiB (binary) = 1,024 bytes — about 2.4 % more. By the time you hit terabytes, binary units are roughly 10 % larger than their decimal cousins. For envelope work, the two are interchangeable; for billing and disks, they are not.
For mental math, treat 210 ≈ 103, 220 ≈ 106, 230 ≈ 109. The cumulative error after multiplying three of these is only a few percent — well inside the noise of a back-of-the-envelope answer.
Commonly-cited orders of magnitude. Use them to spot designs that quietly assume the impossible — like "synchronously fan out to 50 services on the hot path."
| Operation | Time (approx) | In ns | Relative scale |
|---|---|---|---|
| L1 cache reference | ~0.5 ns | 0.5 | |
| Branch mispredict | ~5 ns | 5 | |
| L2 cache reference | ~7 ns | 7 | |
| Main memory (RAM) read | ~100 ns | 100 | |
| Compress 1 KB with a fast codec | ~3 µs | 3,000 | |
| Send 1 KB over a 10 Gbps NIC | ~1 µs | 1,000 | |
| SSD random read (NVMe) | ~100 µs | 100,000 | |
| Round-trip within a datacenter | ~500 µs | 500,000 | |
| HDD seek | ~10 ms | 10,000,000 | |
| Cross-region RTT (e.g. EU ↔ US-East) | ~80 ms | 80,000,000 | |
| Cross-continent RTT (e.g. EU ↔ APAC) | ~200 ms | 200,000,000 |
RAM is ~100× faster than SSD; SSD is ~100× faster than spinning disk; same-DC network is between RAM and SSD; cross-region adds tens of milliseconds you cannot optimise away. Light in fibre travels at roughly two-thirds of c — that is the floor for every cross-continent hop.
Each additional nine is roughly 10× harder. Past three nines, most outages come not from hardware but from deploys, config changes and human error — so the marginal investment shifts from redundancy to process.
| Availability | Annual downtime | Monthly | Daily |
|---|---|---|---|
| 99 % "two nines" | ~3.65 days | ~7.2 hours | ~14.4 min |
| 99.9 % "three nines" | ~8.77 hours | ~43.8 min | ~86 sec |
| 99.95 % | ~4.38 hours | ~21.9 min | ~43 sec |
| 99.99 % "four nines" | ~52.6 min | ~4.3 min | ~8.6 sec |
| 99.999 % "five nines" | ~5.26 min | ~26 sec | ~0.86 sec |
A request that crosses three services each at 99.9 % has an effective ceiling of roughly 99.7 %. Every synchronous dependency taxes your headline number — caches, retries and graceful degradation are how you claw it back.
99.95 % monthly leaves ~22 minutes to burn. A single botched 10-minute deploy can consume half of it. Treat the budget as a real resource, not a vanity metric.
Start with the user count, then layer on behaviour and peakiness. Average QPS sizes your steady-state fleet; peak QPS sizes your worst-Tuesday-at-9pm capacity.
Most consumer products see peak QPS between 2× and 5× the daily average. Pick a sane multiplier based on the traffic shape — a chat app peaks gently, a livestream catastrophically.
Estimate reads and writes separately. Read:write ratios of 10:1 or 100:1 are common, and they imply very different storage, cache and replication strategies.
Size the design for traffic 12–18 months out, not today's load. The cost of slightly over-provisioning is tiny; the cost of an emergency re-architecture is enormous.
Four multipliers, in that order. Skip any one and your estimate is off by an order of magnitude.
A "post" might average 400 bytes of text but carry a 2 MB image. Decompose: text, metadata, attachments, indexes. Add 20–30 % for serialization overhead and database internals you cannot see.
Most distributed stores keep 3 replicas inside a region. Cross-region disaster-recovery copies push the factor to 4–6×. Erasure coding can claw some back at the cost of CPU and latency.
If active users grow 20 % yearly and per-user content also grows, total storage grows multiplicatively. Three years of 1.5× growth is 3.4×, not 4.5×.
Indexes can add 30–80 % on top of raw data. Logs and audit trails often exceed the data they describe. Backups, depending on policy, multiply the whole figure again.
Throughput is the easy part. The catch is that ingress and egress are rarely symmetric, and cloud egress is what shows up on your bill.
A NIC sold as "10 Gbps" delivers ~1.25 GB/s of payload — and far less once protocol overhead, encryption and TCP headroom are accounted for. Plan to use ~60 – 70 % of nominal capacity at peak.
A video service might ingest 1 MB per upload but stream 500 MB per view — a 500× asymmetry. A messaging app is closer to symmetric. A search engine is read-heavy but with tiny responses.
Bandwidth inside an availability zone is essentially free; across zones it costs cents per GB; across regions or out to the internet it costs many times more. Estimating egress separately by destination is what turns a sticker shock into a deliberate choice.
If your design implies sustained egress above ~40 Gbps per server, you are probably missing a CDN or an edge cache.
Imagine a microblogging product called Quill. We want to size storage, read QPS and egress for the home timeline.
Now a photo product called Pebble. Photos dominate storage and egress — the opposite shape from Quill.
10 Gbps is not 10 GB/s — it is 1.25 GB/s before overhead. Confusing the two yields answers off by 8×, more than any other single mistake.
If your fleet is barely sufficient at the daily average, it is on fire at peak. Always carry a separate peak number through to the capacity answer.
Raw data is rarely what hits the disk. Three replicas, two secondary indexes and a backup easily multiply the raw figure by 8×.
"Today's storage" is not the design target. Size for the traffic you expect 18–24 months out, then add a comfortable buffer on top.
"4,728,193 QPS" is not more credible than "~5 M QPS" — it is less, because it hides the assumptions. Quote the rounded figure plus the inputs.
A few celebrity users or viral items can generate more load than millions of typical ones combined. Hot-key handling deserves its own line in the estimate.
"~50 k QPS" is the right precision for napkin work. Carrying four digits implies an accuracy you do not have and slows the math you need to do in your head.
"Between 8 and 12 PB" admits uncertainty honestly and gives reviewers something to push on. A single number invites false debate over the last digit.
If your answer says one machine handles 10 M QPS or that a region needs 50 PB of RAM, stop. Cross-check against published numbers or your own production telemetry.
The estimate's value is the trail of assumptions, not the final number. Anyone should be able to change one input and re-run the math in 30 seconds.
The goal is not the right answer to four decimal places. It is the right answer to within a factor of two, fast enough that you can ask "what if?" five times before lunch.
Users × behaviour × payload × replication × growth. Every estimate worth doing is a chain of small, defensible multipliers.
Carry both numbers all the way through. They answer different questions and they cost very different amounts of money.
Cache, RAM, SSD, disk, intra-DC, cross-region. Once these orders of magnitude are reflex, the rest of the math becomes fast.
They are usually 5–10× the raw data figure. Skipping them is the single most common reason envelope estimates are wrong.
One significant figure, a low–high band, and a visible list of inputs. That is what makes the estimate useful instead of merely impressive.
Armed with these envelopes, we will start building real systems — load balancers, caches, replicated databases, and the trade-offs between them.