Chapter 15 System Design Notebook
Chapter Fifteen

Design Google Drive

Building a planet-scale file storage and synchronization service: how bytes flow from a laptop to an object store, how edits ripple to every device a user owns, and how to do all of it without re-uploading what we already have.

Topic
Cloud Storage & Sync
Scope
Architecture + Trade-offs
Slides
13
Ch 15 · Design Google Drive Slide 02

What does a file service actually owe its users?

Before naming any technology, list the user-visible behaviours the system must guarantee. These five together define the problem.

01

Upload & download

Any file type, sizes from a few KB up to multi-gigabyte videos. Resumable on flaky networks; never lose bytes mid-flight.

02

Sync across devices

An edit on the laptop appears on the phone seconds later. Each client converges to the same view without manual refresh.

03

Share with others

Generate a link, invite by email, assign a role. Permissions propagate to nested folders and respect revocation.

04

Version history

Every save is recoverable. Users can roll back yesterday's accidental delete without contacting support.

05

Non-functional

Strong durability (11 nines), eventual consistency across devices, sub-second metadata reads, bandwidth-efficient sync, and per-user encryption at rest.

Ch 15 · Design Google Drive Slide 03

A bird's-eye view of the moving parts

Five services collaborate. The client treats the cloud like a remote disk; behind the curtain, metadata and bytes live in different stores so each can be scaled, replicated, and optimised on its own terms.

Desktop Client watcher · diff · chunker local cache Mobile App streaming reads on-demand sync Web Browser JS uploader previews API Gateway auth · routing rate limiting Sync Service orchestrates upload resolves diffs Metadata DB files · versions permissions · blocks Notification Svc pub/sub fanout to other devices Block Storage object store, keyed by content hash replicated 3x Edge / CDN cached downloads close to user HTTPS / gRPC writes blocks lookup hash push events
Ch 15 · Design Google Drive Slide 04

Cut every file into fixed-size blocks

Treating a 4 GB video as one giant blob is a recipe for retries, timeouts, and wasted bandwidth. Slice it into uniform 4 MB blocks and the problem collapses into many tiny, parallelisable, retriable transfers.

presentation.key · 18.2 MB one logical file split into 4 MB blocks Block 1 4 MB a3f1... Block 2 4 MB 7b2c... Block 3 4 MB 9e44... Block 4 4 MB 02fa... Block 5 2.2 MB cc81... hash each block (SHA-256) SHA-256 a3f1...4d SHA-256 7b2c...9a SHA-256 9e44...11 SHA-256 02fa...7c SHA-256 cc81...e0 upload in parallel Block Storage (Object Store) key = block hash · value = bytes

Why fixed-size blocks?

Parallelism. A 1 GB file becomes 250 four-megabyte uploads that can saturate the user's pipe instead of starving on a single TCP stream.

Retriability. If one block fails the integrity check, retry just that block — no need to re-send the other 249.

Streaming. The client can start playing a video as soon as the first few blocks land, instead of waiting for the whole file.

Why 4 MB? Big enough that per-block overhead (HTTP headers, metadata rows) stays small. Small enough that a single failed retry costs little. Dropbox famously picked 4 MB; values from 1–16 MB all work.

parallel resumable streamable
Ch 15 · Design Google Drive Slide 05

Hash the block, store it exactly once

The same PDF lives in a thousand inboxes. The same npm tarball lives in a million repos. Identical bytes should be stored a single time, globally, across every user.

The trick

Before uploading a block, hash it. Ask the server: do you already have this hash? If yes, skip the byte transfer entirely and just point this user's file at the existing block.

Two layers of dedup

  • Per-user dedup — a user uploading the same attachment twice pays for it once.
  • Global dedup — across all tenants, popular blocks (templates, common libraries, identical photos) live in storage exactly once.

The cost

A reference counter per block: when the last file pointing at a block is deleted, the block itself can finally be garbage-collected. Privacy-sensitive deployments often disable cross-tenant dedup so one user can't probe whether another stores a given file.

SHA-256 ref-count GC privacy trade-off
Alice Bob Carol a3f1 B1 7b2c B2 9e44 B3 a3f1 B1 7b2c B2 02fa B4 a3f1 B1 cc81 B5 f009 B6 Block Store global · dedup a3f1... refs: 3 7b2c... refs: 2 9e44... refs: 1 02fa... refs: 1 cc81... refs: 1 f009... refs: 1 9 logical blocks across 3 users → 6 physical blocks stored
Ch 15 · Design Google Drive Slide 06

Only re-upload what actually changed

A user edits one paragraph in a 200-page document and hits save. A naive sync re-uploads the whole file. A delta-aware sync re-uploads two blocks. Same correctness, 99% less bandwidth.

Before edit · version 1 B1 a3f1... 4 MB B2 7b2c... 4 MB B3 9e44... 4 MB B4 02fa... 4 MB B5 cc81... 4 MB User edits paragraph in block 3 → After edit · version 2 B1 a3f1... unchanged B2 7b2c... unchanged B3' bd71... NEW B4 02fa... unchanged B5 cc81... unchanged Naive whole-file sync re-upload all 20 MB B1 + B2 + B3' + B4 + B5 Delta sync upload only 4 MB B3' (the changed block) How the client decides: 1. Rechunk & hash the local file 2. Compare hash list with server's hash list 3. Upload only the new hashes 4. Commit a new file pointer to those blocks
Ch 15 · Design Google Drive Slide 07

The metadata DB is where the file actually lives

Blocks in the object store are anonymous bytes. The metadata DB gives them names, parents, owners, version history, and permissions — everything the user thinks of as "my file."

files indexed on owner_id, parent_id
columntypenotes
file_iduuidprimary key
owner_iduuidcreator
parent_iduuidfolder it sits in
namestringdisplay name
current_versionuuid→ versions.id
is_deletedboolsoft-delete flag
versions append-only
columntypenotes
version_iduuidprimary key
file_iduuidwhich file
block_listjsonb / arrayordered hashes
size_bytesint64full file size
created_attimestamp
created_byuuidwhich device
blocks content-addressed
columntypenotes
block_hashstring (sha256)primary key
size_bytesintactual size
ref_countintGC when 0
storage_keystringS3 location
permissions access control
columntypenotes
file_iduuid
principal_iduuiduser or group
roleenumviewer / editor / owner
granted_attimestamp

Sharding key: owner_id for files/versions — keeps a user's data on a single shard for fast folder reads. The blocks table is sharded by block_hash prefix since it is global.

Ch 15 · Design Google Drive Slide 08

Block storage: an object store keyed by content

The bytes themselves go nowhere near a relational database. They land in an object store — S3, GCS, or an in-house equivalent — where the key is not a filename but the cryptographic hash of the contents.

Why content-addressed?

Dedup falls out for free. Two identical blocks map to the same key. A second write is a no-op.

Tamper-evident. Re-hash on read; if it doesn't match the key, the storage layer corrupted it. Self-healing replication can pick a different replica.

Cache-friendly. The hash is the etag. CDNs and clients can cache aggressively because the key never refers to different bytes.

Tiering

  • Hot tier — SSD-backed, ~ms latency. Recent or frequently-read blocks.
  • Warm tier — HDD-backed, ~10ms. Older versions, archived files.
  • Cold tier — tape/Glacier, minutes to restore. Long-retention compliance copies.

Lifecycle policies migrate blocks down the tiers based on last-read timestamp. Reference counts in metadata drive the eventual delete.

3x replicated erasure coded (cold) encrypted at rest
KEY: a3f1c8...4d2e9b (SHA-256 of block) Object Store — 3 replicas, distinct AZs AZ — east-1a replica 1 disk-7 / sector 0x4f AZ — east-1b replica 2 disk-2 / sector 0x9c AZ — east-1c replica 3 disk-5 / sector 0x21 Block lifecycle write — first user uploads → ref_count = 1 share — others reference same hash → ref_count++ delete — ref_count → 0 → background GC sweeps it
Ch 15 · Design Google Drive Slide 09

Tell every other device, fast

An upload isn't done when the bytes land — it's done when the user's phone, tablet, and second laptop all know about it. The notification service fans the event out to every subscribed device.

Laptop A user just saved report.docx v7 "committed" commit Sync Service writes metadata emits event event Notification Svc topic: user-42 subscribers: 4 long-poll · WS · push fan-out Phone pulls v7 diff → downloads 1 block Tablet notified, fetches when reopened Laptop B WebSocket open → instant push Old Laptop (offline) missed event — picks up via cursor on reconnect Delivery model: each client maintains a monotonic sync_cursor; on reconnect it asks "what changed since cursor X?" — so push is an optimisation, not a correctness requirement. Long-polling is the universal fallback (works behind NATs and corporate firewalls). WebSockets / SSE preferred when available.
Ch 15 · Design Google Drive Slide 10

Immutable blocks, mutable pointers

Versioning gets cheap and correct if you commit to one rule: blocks never change. A new version is just a new ordered list of block hashes; old versions still point at their old hashes.

file: report.docx v1 Apr 03 · 18 MB [B1, B2, B3, B4] v2 Apr 04 · 18 MB [B1, B2, B3', B4] v3 (current) Apr 06 · 19 MB [B1, B2, B3', B4, B5] current → B1 a3f1 B2 7b2c B3 9e44 B3' bd71 B4 02fa B5 cc81 Reading any version is just: 1. load version row → ordered block hashes 2. GET each hash from object store 3. concatenate → original file Rolling back means setting current_version → old version_id. Zero byte movement.

Why immutability wins

Versions become cheap. v2 only stores the delta (one block hash) on top of v1; the unchanged blocks are referenced, not copied.

Time travel is a pointer flip. "Restore yesterday's version" updates a single row in the metadata DB — no byte movement.

Deletes are safe. Removing a version decrements ref counts; blocks survive until truly orphaned.

What about retention?

A retention policy caps how many versions to keep (e.g., last 30 days, last 100 versions). The oldest version row is dropped; its referenced blocks lose a ref-count and become eligible for GC if no other version still needs them.

Edge: rename

A rename touches only files.name — the version chain and the bytes are unaffected. Moving across folders is just a parent_id update.

Ch 15 · Design Google Drive Slide 11

When two devices edit the same file

Two laptops go offline, both edit budget.xlsx, both reconnect. The server now has to pick — or refuse to pick — a winner. The right answer depends on whether silently dropping work is acceptable.

Strategy A

Last-write-wins (LWW)

How: Tag each upload with a server-assigned timestamp or monotonic version number. Whichever arrives last becomes the current version.

Cost: The losing device's edits are demoted to an older version. They are not lost — version history preserves them — but they no longer appear as the live file.

Good for: documents where users expect a single canonical "latest", and the chance of true simultaneous edits is low (most consumer use cases).

Strategy B

Keep-both (fork)

How: Detect that both clients diverged from the same base version. Instead of choosing, create budget (conflicted copy from Laptop B).xlsx alongside the original.

Cost: Users see two files and must reconcile manually. Annoying — but no edits silently disappear from the file tree.

Good for: spreadsheets, code, anything where merging unseen changes is unsafe. This is Dropbox's classic default.

Detecting the conflict at all

Every upload carries the parent_version_id it was derived from. If a client tries to commit v3 on top of v2 but the server has already advanced to v3', the server knows the client's parent is stale. That's the trigger — without it, both strategies collapse to "blindly overwrite."

vector clock (per-device counters) CRDTs for rich-text docs operational transform for live editing

For real-time collaborative editing (Docs, Sheets), file-level conflict resolution gives way to operation-level merging — every keystroke is an op that gets transformed against concurrent ops. That's a different problem and lives one layer above the storage system.

Ch 15 · Design Google Drive Slide 12

Sharing, links, and who is allowed to do what

Permissions are the most subtle correctness problem in a file system. A single bad row leaks private data; an over-eager cache leaks stale private data. Get the model right before scaling it.

Two share modes

  • Per-principal — "Bob can edit this folder." Stored as a row in permissions tying a user (or group) to a role.
  • Link-based — "Anyone with this URL can view." The URL embeds an opaque, unguessable token (e.g., 128-bit random) that the server looks up. Revoking a link rotates or invalidates the token.

Roles, smallest set that works

  • Viewer — read & download.
  • Commenter — viewer + leave comments.
  • Editor — read + write + create new versions.
  • Owner — editor + reshare + delete + change permissions.

Inheritance

Folder-level grants cascade to children, but explicit grants on a child can extend (not narrow) — preventing a confusing situation where a user has access to a folder but mysteriously lacks access to a file inside it.

The check, on every request

can_access(user, file, action):
  1. walk file → parent → ... → root
  2. collect grants where principal ∈ user's groups
  3. find highest role; check role ⊇ action
  4. deny otherwise

This walk happens on every metadata read, so the path-to-grants index has to be hot. A common pattern: denormalise an effective_acl column on each file, recomputed when ancestor grants change.

Audit & revocation

Every grant change is logged with actor, target, before, and after. Revocation is immediate for token-based checks but takes up to the cache TTL for denormalised ACLs — usually seconds, which is acceptable for files that aren't actively under attack.

RBAC capability URLs cascading grants audit log
Chapter 15 · Design Google Drive Summary

Six principles that shape a file-sync system

Strip away the specifics and the same handful of ideas keep returning. Each is a lever you can adjust to trade bandwidth, latency, durability, or privacy against one another.

01

Separate bytes from metadata

The object store holds anonymous blocks; the metadata DB holds the names, trees, and ACLs. Each scales on its own axis.

02

Chunk first, hash always

Fixed-size blocks plus content-addressing turn one hard problem (giant file transfers) into many easy ones.

03

Never move bytes you can avoid

Dedup eliminates duplicate writes; delta sync eliminates redundant uploads; pointer-flips eliminate copies.

04

Make blocks immutable

Cheap versions, safe deletes, simple caching, and bullet-proof integrity all fall out of one rule.

05

Push is an optimisation, cursors are correctness

Every client should converge on reconnect via a monotonic cursor; real-time notifications only make convergence faster.

06

Decide your conflict policy up front

Last-write-wins is simple but can hide work; keep-both is safer but noisier. Pick deliberately, document loudly.