Suggest likely query completions while the user is still typing. The whole challenge is
shipping the right list in under a hundred milliseconds, again and again, across billions
of keystrokes.
swiss jobs — full stack swiss alps
Slide 02 · Scope
What the system must do
Autocomplete sits in the user's hot path. Every extra millisecond is felt directly while
they type, so the constraints are unusually strict for what looks like a simple lookup.
Functional
Given a prefix string, return the top K matching queries.
Ranked by popularity (with optional personalisation later).
Updates as the user adds or deletes characters.
Multi-language, case-insensitive, tolerant of trailing spaces.
Out of scope (for now)
Spelling correction beyond what the index already absorbs.
Semantic search and embeddings.
Long-form sentence completion.
Non-functional
Latency: end-to-end under 100 ms at p99.
Freshness: trending queries should appear within hours, not days.
Throughput: every keystroke is a request — expect ~10x query QPS.
Availability: graceful degradation, never a 5xx in the typeahead box.
Cost: cheap per lookup; the box is on every page.
Back-of-envelope
10 M DAU × 10 queries/day × 6 keystrokes/query
≈ 600 M requests/day
≈ 7 K QPS average, 20–30 K peak
Slide 03 · Core Data Structure
The trie: a tree of shared prefixes
A trie stores strings character by character. Every word that shares a prefix shares a
path, so finding "all queries starting with swi" is just
walking three edges and reading the subtree underneath.
A toy trie holding "swi", "sus", "cat", "cod" — one path per shared prefix.
Slide 04 · Ranking Signal
Attach a frequency to every terminal
A trie tells us which strings exist. To rank suggestions we also need to know which ones
people actually search for. Store a frequency counter on each terminal
node — the count of times that exact query was issued in the recent window.
What the count represents
The number of times that exact query string was logged over the chosen time window
(often a sliding 30-day count). Newer counts can be weighted higher with exponential
decay so trends bubble up.
Where it gets stored
Only on terminal nodes (intermediate nodes don't have a query).
Updated by an offline aggregation job, not on the live request path.
Can be a small integer once normalised — rank order matters, not absolute counts.
Slide 05 · The Obvious Algorithm
Naïve lookup: walk the prefix, then explore everything below
The textbook algorithm: descend to the prefix node by walking one edge per character, then
do a depth-first search of the subtree, collect every terminal, and sort by frequency.
It works — until the prefix is short.
Steps per request
Walk the prefix: O(P) where P is the prefix length.
DFS the subtree: visit every descendant terminal.
Sort or heap-select the top K by frequency.
Return the K winners.
Why it falls apart
The subtree under s can hold millions of distinct
queries. Doing a full traversal on every keystroke for every user blows the latency
budget by orders of magnitude.
prefix "s" → ~10M nodesDFS > 500 ms
Complexity, at a glance
Step
Cost
Descend to prefix node
O(P)
DFS subtree (N terminals)
O(N)
Top-K via heap
O(N log K)
Total per keystroke
O(P + N log K)
P is tiny but N grows with traffic. For the popular short prefixes, N is exactly
where we cannot afford linear work.
Slide 06 · The Speed Trick
Cache the answer at every prefix node
Instead of recomputing the top K each time, precompute and store the list directly on
every node. A lookup becomes: walk to the node, return its cached list. O(P)
total — constant in the size of the corpus.
Each node along the prefix path "swi" carries its own precomputed top-K. The query is a pointer chase.
Slide 07 · The Trade-off
You bought speed with memory — now prune
Storing K strings at every node multiplies memory cost by roughly K. For a trie of 50 M
nodes and K = 10, that is 500 M cached strings. Pruning techniques bring it back to a
workable footprint.
Where the cost lives
Each prefix node holds an array of K (suggestion, score) pairs.
Long-tail prefixes (10+ chars) hardly serve traffic but still cost RAM.
Strings can be stored as offsets into a single string pool.
Pruning strategies
Min-frequency cutoff: drop queries searched fewer than N times.
Max prefix depth: stop precomputing past depth 6 or so — long
prefixes have few candidates, a quick DFS is fine.
Variable K: store more suggestions for short, hot prefixes; fewer
for deep, rare ones.
Compress the trie: collapse single-child chains into a radix /
Patricia tree.
Slide 08 · Build Pipeline
Build the trie offline from query logs
The serving trie is a derived artifact. A periodic batch job reads raw search logs,
aggregates them into query → count, builds a fresh trie with precomputed top-K, and
publishes it to the serving fleet as an immutable blob.
Slide 09 · Freshness
Two answers to "how fresh is fresh enough?"
Batch rebuilds give you new trends every few hours. For most products that is fine.
When breaking news or live events demand minute-level freshness, layer a streaming
update path on top.
Path A · Delayed batch refresh
simplecheapstale ~hours
Rebuild the trie every 1–6 hours from the rolling log window.
Atomic swap into serving nodes — readers never see a partial state.
Good default. Most queries don't trend that fast.
Path B · Streaming top-up
fresh ~minutesmore complextwo systems to reason about
Tail the query log via Kafka / Flink in real time.
Maintain a small delta trie of recently-spiking queries.
At query time, merge the base trie's top-K with the delta's top-K.
Delta is wiped every batch swap.
Slide 10 · Cache Hot Prefixes
Most keystrokes hit the same few hundred prefixes
Query prefixes follow a steep Zipf-style curve. A small cache at the edge or in Redis can
absorb the bulk of traffic, leaving the trie servers responsible for the long tail.
What to cache where
CDN: short, popular prefixes that are the same for everyone. TTL of a few minutes.