A walk through the architecture behind a planet-scale video-sharing
service — how raw uploads are turned into adaptive streams, how those
bytes travel through storage tiers and a global CDN, and how metadata
like views and comments stays consistent while the video files
themselves live somewhere very different.
Slide 02 · Scope
What we are actually building
Before any boxes-and-arrows, pin down what the system must do and what
it must survive. We will treat the workload as read-heavy, write-light,
and globally distributed.
Functional
Creators upload a video file from web or mobile.
Viewers watch on phones, laptops, TVs and slow connections.
Comments, likes, subscriptions and basic social state.
Personalised home and "up next" recommendations.
Search by title, channel and topic.
Out of scope (today)
Live streaming, monetisation flows, copyright matching, and
studio-grade editing tools. Each deserves its own design.
Non-functional
Billions of registered viewers, hundreds of millions active daily.
Playback start under two seconds at the 95th percentile.
Eleven nines of durability for stored masters.
Reads dominate writes by roughly two orders of magnitude.
Graceful degradation: a comment outage must not stop playback.
Workload shape
Many short watches, very few uploads. The hot tail (last 24 hours)
drives almost all bandwidth; the long tail dominates storage.
Slide 03 · Back-of-envelope
How big is "big" here?
Numbers below are rough sketches to size the system — they are not
benchmarks of any real service. The point is to find the order of
magnitude that the architecture must respect.
~720 h
New uploads / minute
Across all creators. Each minute of source video
averages around 120 MB after capture.
~5 PB
Raw ingest / day
720 hours × 60 min × 120 MB × 1440 ≈ 7.5 PB raw
before encoding. Renditions add roughly another 1×.
~3.5 EB
Net storage / year
After dedup, garbage collection of failed uploads,
and compression of long-tail content into cheaper codecs.
~1 B
Daily playback sessions
Average session pulls 4 Mbps for 8 minutes ≈
240 MB egress per session at typical HD quality.
~240 PB
Egress / day from edge
Most of this never touches the origin — the CDN
absorbs well above 95% of bytes.
~22 Tbps
Peak global bandwidth
Diurnal peaks land in the evening of each large
timezone. We design for the union of those peaks.
Slide 04 · The big picture
High-level architecture
Two data planes run side by side. A control plane carries metadata —
titles, captions, comments, view counts — while a separate blob plane
moves the actual pixels through encoders, storage tiers and the CDN.
Slide 05 · Ingest
Upload flow — chunked, resumable, idempotent
An hour-long 4K file is gigabytes of bytes traveling over a flaky
hotel Wi-Fi. Treat the upload as many small commitments, not one big
bet, and let the client retry without re-sending what already landed.
Step by step
Init — client calls the upload service, receives an
uploadId plus pre-signed chunk URLs.
Chunk — file is split into ~8 MB parts, each uploaded
directly to object storage with its own SHA-256.
Resume — on retry the client asks "which offsets did
you receive?" and only re-sends gaps.
Commit — server-side multipart finalisation atomically
stitches parts into one immutable object.
Notify — a message lands on the transcoding queue; the
upload API returns a watch-later placeholder.
Why it matters
Idempotent uploads survive flaky links, and direct-to-blob
transfers keep the API tier out of the data path.
Slide 06 · Encoding
Transcoding pipeline as a DAG
A raw master fans out into many derived artifacts. Each node is a
pure function over its inputs; missing outputs are simply rebuilt by
the scheduler.
Slide 07 · Why so many copies?
One source, many renditions
A flagship phone on a fibre link and a five-year-old tablet on a
crowded café Wi-Fi cannot use the same file. The transcoder builds
a ladder of versions; the player picks the rung that fits right now.
Devices vary wildly
Codec support, screen size, decoder power, RAM and DRM
envelope all differ. We can't ship one universal artifact.
Networks vary even more
Throughput swings inside a single watch session — elevator,
subway, kitchen, café. Static quality forces either rebuffering
or wasted bytes.
Newer codecs save bytes
AV1 and VP9 cut bandwidth ~30–50% over H.264 but are too slow
for low-end CPUs. We ship multiple codecs so each device picks
its best trade-off.
Slide 08 · Cost shape
Hot, warm, cold — long tail to deep archive
A tiny fraction of videos receive almost all views in their first
week. Everything else becomes long-tail, watched rarely but kept
forever. The storage cost equation breaks unless we move bytes
with their access pattern.
Promotion / demotion
An access-frequency tracker watches reads per object per
hour. Crossing thresholds triggers async copy up or down a tier.
The viewer never blocks on this.
Re-encode the tail
Long-tail videos quietly get re-encoded into more efficient
codecs and lower max-rungs to shrink their footprint. The
master copy stays in archive.
Eleven nines on the master
Renditions are derived data and can always be rebuilt from
the master. We pay top-tier durability only on the source.
Slide 09 · How bytes reach the player
HLS & DASH — manifests over plain HTTP
We do not stream one giant file. Each rendition is sliced into
short segments (2–6 seconds each), and a manifest tells the player
where every segment lives.
Why segment?
Small files cache cleanly at every layer, allow byte-range
fetches, and let the player change quality between segments
without reloading the whole stream.
HLS vs DASH
Both expose a master manifest pointing at per-rendition
playlists. HLS uses .m3u8, DASH uses .mpd.
Modern packagers emit both side by side.
Plain HTTP wins
Everything is GETs of static objects — easy to cache, easy
to scale, friendly to existing CDN infrastructure.
Slide 10 · The player is half the system
Adaptive bitrate selection
Quality decisions happen on the device, segment by segment.
The server just exposes the menu; the client orders the dish that
fits its current appetite.
How the player decides
Times each segment download and updates a rolling
bandwidth estimate.
Watches its own playback buffer — low buffer means
"drop quality now, recover later".
Adds hysteresis so it does not flap on every wobble.
The device sees real-time conditions the server cannot —
radio strength, contention, throttling — and reacts within a
single segment.
Slide 11 · The last mile
CDN — edge POPs, mid-tier shield, origin
A three-layer cache hierarchy puts bytes seconds away from every
viewer while protecting origin storage from the herd. The trick is
collapsing cache misses so the origin sees one request, not a million.
Slide 12 · Two databases, two destinies
Metadata and blobs live apart
Titles, comments, view counts and recommendations are small
and accessed via complex queries. Video bytes are huge and
accessed by primary key only. Mixing them is the fastest way to a
crashed database.
Why split?
Different access patterns, different durability budgets,
different scaling axes. Metadata grows with users; blobs grow
with hours of video.
The link is just a key
The metadata row stores an object path. Replacing the blob
backend later is a one-table migration.
Failure isolation
A metadata outage might hide titles, but the CDN keeps
serving bytes to ongoing watches. Each plane can scale and fail
on its own clock.
Slide 13 · Counting at scale
View counts — async aggregation
A trending video can attract a million plays per minute. Writing
one row per view into a relational table will incinerate the
database. Counts are eventually-consistent metrics, not transactions.
Slide 14 · Closing
Six principles to take with you
A video platform is many systems wearing a trench coat. The patterns
below repeat at every layer — keep them in mind when sketching the
next read-heavy, blob-heavy service.
1
Separate the byte plane from the control plane
Metadata and blobs scale and fail on different curves. Glue
them with a key, not a join.
2
Treat heavy work as a DAG of small, retryable jobs
Transcoding, packaging and reconciliation are all idempotent
steps that can be rebuilt on demand.
3
Push intelligence to the edges
The CDN absorbs the bytes, the player picks the quality, the
origin only sees the misses that survive both.
4
Move bytes with their access pattern
Hot at the edge, warm regionally, cold in deep storage. Pay
top-tier durability only on the master.
5
Counts are streams, not transactions
Views, likes and trending are async aggregates. Show a fast
approximation now, reconcile the truth later.
6
Design every component to fail without taking the watch with it
Comments down, recommendations down, search down — playback
must keep rolling.