Chapter 3 · System Design Fundamentals

A Framework for System Design Interviews

Open-ended design questions feel chaotic only when you walk in without a plan. Here is a four-step routine that turns the next 45 minutes into a guided conversation about tradeoffs.

Open the companion slides
Reading time ~12 min Prerequisites Ch 1, Ch 2 Audio 🔊 Hinglish read-aloud Next Design a Rate Limiter

The crucial reframe: the interviewer is grading a thought process, not a finished blueprint. No real system fits on a whiteboard in an hour. What the panel probes is how you reason under ambiguity, how you choose between imperfect options, and how you collaborate while doing it.

Primary source

Alex Xu, System Design Interview (Vol. 1), Chapter 3. Companion deck: slides — jump to any slide with the Slide N chips.

What is actually measured Slide 2

Four signals sit underneath every prompt — the architecture you land on matters far less than these.

1 · Problem framing

Can you turn a vague prompt into a concrete, bounded problem with explicit assumptions and a target scale?

2 · Component reasoning

Do you know which building blocks exist, what each is good and bad at, and how they snap together?

3 · Tradeoff fluency

When you pick something, can you name what you gave up — latency, cost, consistency, operational load?

4 · Collaboration

Do you absorb hints, change course gracefully, and treat the interviewer as a teammate, not a judge?

The four-step spine Slide 3

Whether the question is a chat app, a ride-share dispatcher, or a metrics pipeline, the same loop applies. Run it deliberately and the interview stops feeling like improv. The minutes below assume a 45-minute slot.

1 Understand & scope Slide 4 Clarify what's in and out of bounds; pin functional + non-functional requirements. ~8 min.
2 Sketch the high-level design Slide 5 Boxes, arrows, the data model, and a few APIs — agree the skeleton. ~12 min.
3 Deep dive Slide 6 Take the interviewer's pointer; unpack one or two components a level down. ~18 min.
4 Wrap up Slide 7 Name the bottlenecks, the next moves, and how you'd operate it. ~5 min.

Steps are sequential — but return to step 1 if a deep dive uncovers a missed requirement.

Step 1 — Understand & scope Slide 4

The biggest losses come from solving a problem the interviewer didn't ask. Clarifying questions are not a delay — they are the design. Surface assumptions out loud and write the agreed scope in the corner of the board.

vague prompt "Design a feed" clarifying questions Who posts? Who reads? How many users? Photos? bounded scope 10M DAU · text + photo eventual · no DMs
Narrow the prompt before any boxes are drawn.

Functional — what must it do?

The 2–3 non-negotiable user flows; what's explicitly out of scope; and whether it's read- or write-heavy (that one answer flips the architecture).

Non-functional — how well?

Scale (DAU, peak QPS, payload), latency & availability targets (p99, SLO), consistency tolerance, and hard constraints (region, regulation, mobile-first).

Step 2 — High-level design Slide 5

Sketch the main components and how requests flow between them. Keep it deliberately shallow — skeleton, not finished product. Five or six boxes is plenty.

Client web / mobile Load Bal. + TLS, rate-limit API service stateless Cache hot reads Primary DB sharded by user_id Queue async Workers fan-out, indexing
Anchor it with a contract: 3–4 API endpoints, 2–3 core entities, and one sentence on SQL vs NoSQL.

Step 3 — Deep dive Slide 6

This is where most of the signal is produced. The interviewer almost always nudges you — "let's talk about how the feed is generated" or "what if a worker dies mid-job?" Take the hint. Pick one or two components and unpack them.

What to unpack

Data layer (schema, partition key, hot keys); cache strategy (write-through vs aside, TTL, invalidation); async pipeline (at-least-once, idempotency, dead-letters); a specific algorithm; or failure modes.

How to dive without drowning

Quantify first ("50k writes/s, so one Postgres won't fit"); name 2–3 real options; pick one and state its cost; walk a read and a write through out loud; then stop — don't ramble past the answer.

The redirect is the signal

If the interviewer steers you elsewhere, drop the current thread immediately and follow the new pointer. It's routing, not interruption.

Step 4 — Wrap up Slide 7

Spend the last minutes proving you can see your own design clearly.

Call out bottlenecks

Be specific: "the single-primary write path saturates near X QPS; the cold-start cache-miss storm is the other risk."

Sketch the next moves

Shard further, add a per-region read replica, batch writes through a buffer, introduce a CDN — framed as the next conversation, not promises.

Operate it

What do you log, what do you alert on, what's the SLI? Mention deploys, on-call, and a failure-recovery playbook.

Own the gap

"Given more time, I'd first revisit X — it's the tradeoff I'm least confident about." Owning uncertainty reads as senior.

The time budget Slide 8

Pacing is graded even if no one says so. The most common failure is spending 25 minutes on clarifications and never reaching a real design. A 45-minute slot, roughly:

1 · Understand & scoperequirements, scale, constraints
~8 min
2 · High-level designboxes, arrows, APIs, data model
~12 min
3 · Deep diveone or two components, tradeoffs
~18 min
4 · Wrap upbottlenecks, next moves, ops
~5 min
Buffer / Q&A
~2 min

For a 60-minute slot, scale each block — but keep the deep-dive share the largest.

Communicate — how you talk is the answer Slide 9

A silent ten minutes at the whiteboard is the most expensive ten minutes of the round — the interviewer can only score what they hear.

Red flags that sink strong candidates Slide 10

Jumping to a solution

Hearing "design Twitter" and drawing Kafka + Cassandra + CDN before asking how many users or which features. Pattern-matching isn't design.

Ignoring scale

A clean architecture that quietly assumes a single DB, single region, a few hundred users. They're waiting for the back-of-envelope math.

Verdicts without tradeoffs

"I'd use Cassandra." Why not DynamoDB? Why not sharded Postgres? A choice with no stated cost reads as cargo-culted.

Defensiveness under pushback

Treating questions as attacks. Rigidity here is read — accurately — as how you'll behave in real design reviews.

What "strong hire" looks like Slide 11

Rubrics vary by company, but the shape is consistent — and it's rarely about the "right" architecture.

TierBehaviour the interviewer observes
Strong hireDrives the conversation. Quantifies before designing. Proposes multiple options, picks one, names the cost. Catches their own mistakes. Lands a coherent design with a real deep dive and a clear next step.
HireReaches a working design with light prompting. Knows the building blocks. Tradeoffs appear mostly when asked; the deep dive is competent but not unprompted.
MixedNeeds steering through each step. Picks reasonable components but can't defend them. Misses scale or consistency until prompted. Deep dive stays superficial.
No hireSkips clarification. Latches onto one technology. Can't explain why one choice beats another. Becomes defensive, or runs out of things to say.

Active recall

Cover the answers. Say each one out loud before you tap to check.

What is the interviewer actually grading?
Not the diagram.
Your thought process under ambiguity — problem framing, component reasoning, tradeoff fluency, and collaboration. The final architecture matters far less.
Name the four steps, in order.
Same spine, every prompt.
1) Understand & scope, 2) High-level design, 3) Deep dive, 4) Wrap up. Roughly 8 / 12 / 18 / 5 minutes of a 45-minute slot.
Why are clarifying questions so important in step 1?
Biggest source of lost rounds.
They are the design — they stop you solving a problem nobody asked. Surface assumptions out loud and write the bounded scope on the board.
The interviewer asks "what about X?" mid-dive. What do you do?
It's not curiosity.
Follow the pointer immediately — it's routing, not interruption. Drop the current thread without defending it.
What makes a choice read as "cargo-culted"?
Something missing.
A verdict with no stated tradeoff ("I'd use Cassandra," full stop). Always name what you gave up — latency, cost, consistency, or operational load.
A strong closing line for the wrap-up?
Own something.
"Given more time, I'd first revisit X — it's the tradeoff I'm least confident about." Owning uncertainty reads as senior.

Check yourself

Q1 What is the interviewer primarily evaluating?
Why: No real system fits on a whiteboard in an hour. The signal is framing, tradeoffs, and collaboration — not the diagram.
Q2 What should you do before drawing any boxes?
Why: Step 1 turns a vague prompt into a bounded one. Skipping it is the most common way to lose the round.
Q3 In a 45-minute slot, which phase deserves the largest share?
Why: ~18 of 45 minutes. The deep dive is where most signal is produced — protect time for it.
Q4 The interviewer asks "what about X?" while you're mid-dive. Best response?
Why: A redirect is routing toward the signal they want. Take it gracefully, without defensiveness.
Q5 Stating a technology choice with no tradeoff named reads as…
Why: Every choice has a cost. Naming it is the highest-signal habit; omitting it suggests pattern-matching, not reasoning.