System Design · Chapter 12

Design a Chat System

From a single TCP socket to a planet-scale messaging fabric — how WhatsApp, Slack, and Messenger keep billions of conversations live, ordered, and durable.

Real-timePresenceFan-outPush
Requirements 02 · scope before structure

What does the system have to do?

Before drawing boxes, pin down the user-visible features. The chat domain is small in surface area but rich in failure modes — every requirement quietly imposes a different storage or networking constraint.

Feature 01

1:1 chat

Two users exchange messages in near real time. Latency target around 200 ms end-to-end. Messages survive the recipient being offline.

Feature 02

Group chat

Up to a few hundred participants per room. Each member sees the same messages in the same order. Fan-out happens server-side.

Feature 03

Online presence

Show whether a contact is online, away, or last-seen-at. Tolerant of brief disconnects; not a security guarantee.

Feature 04

Message history

Durable, paginated, sorted by time. Searchable on the client. Retention spans years for compliance and user nostalgia.

Feature 05

Push notifications

If the recipient has no live socket, hand the message to APNs / FCM so a notification fires on the lock screen.

Feature 06

Multi-device sync

The same account on phone, desktop, and web sees identical state. Each device tracks its own delivered / read cursors.

Transport 03 · how does the server reach the client?

Choosing the connection model

A chat system stands or falls on its push channel. HTTP request-response is client-pull; real-time chat needs the server to wake up the client. Four options — only one is right.

Approach How it works Latency Server load Verdict
Short polling Client hits /messages every few seconds asking "anything new?". Most calls return empty. Poor Wasteful Reject
Long polling Client opens a request that the server holds until a message arrives or a timeout (30 s) fires. OK Held threads Fallback
SSE One-way stream from server to client over plain HTTP. Client still POSTs to send. Good Light One-way only
WebSocket Full-duplex TCP-like channel over a single upgraded HTTP connection. Bytes flow both ways. Excellent Persistent Pick this

Why WebSocket wins decision

A chat client is bidirectional by definition — it both sends and receives. WebSocket gives one persistent connection per device, framed messages, low overhead per packet, and predictable wake-up semantics on mobile radios. Long polling becomes the graceful fallback on networks that block upgrades.

High-level Architecture 04 · the major services

The four moving parts

Each concern gets its own service so it can scale on its own axis. The chat service holds connections; the rest stay stateless or behind storage.

Mobile WebSocket Web WebSocket Desktop WebSocket L4 / L7 LB sticky by user_id Chat Service holds open sockets routes 1:1 + groups Presence Service heartbeats + TTL "who is online" Push Service APNs / FCM gateway offline notifications Storage KV: messages by chat_id RDBMS: users / groups Discovery user → server

Chat Service stateful

Each instance terminates thousands of live WebSockets. Routes inbound messages to other chat servers, writes to storage, hands off to push when needed.

Presence Service

Tracks the heartbeat of each connected device. Reads are cheap, writes are massive — Redis with TTL keys is the typical home.

Push Service

A thin abstraction over Apple / Google / Web push. Receives async jobs when a recipient has no live socket on any device.

Storage

Hot path: append-only KV store partitioned by chat_id. Cold path: relational store for users, group rosters, profile data.

Service Discovery 05 · who holds whose socket?

Mapping users to chat servers

A user's socket lives on exactly one chat server at a time. To route a message from Alice to Bob, you have to know which box holds Bob's connection. A shared registry — usually Redis or ZooKeeper — answers that question.

chat-server-A Alice Eve Liu Mia chat-server-B Bob Ravi Sam chat-server-C Zoe Ana Service Registry Redis HASH user → server alice → chat-A eve → chat-A bob → chat-B ravi → chat-B zoe → chat-C mia → chat-A Router on chat-A "deliver to bob" → lookup → chat-B → RPC to chat-B lookup(bob) SOCKETS (one per user device) SERVER-TO-SERVER
publish on connect
lookup on send
registry (Redis)

On connect

When a client opens a WebSocket, the receiving chat server writes user_id → self into the registry with a TTL refreshed by heartbeats.

On send

The originating server looks up the recipient. Local? Push the frame directly. Remote? RPC to the holding server, which forwards to its socket.

On disconnect

The TTL lets you survive crashes — a dead server's mappings expire and the next reconnect rewrites the entry on a healthy node.

Message Flow 06 · the happy and unhappy paths

1:1 chat — sender to recipient

A message must be durable before it is delivered; it must be ordered before it is acknowledged. Two paths diverge based on whether the recipient is currently holding a live socket.

ALICE CHAT-A STORE / ID GEN CHAT-B BOB 1. send("hi bob") 2. allocate message_id (Snowflake) 3. append to chat_id partition 4. ack(id, ts) — "sent" lookup(bob) → chat-B 5. forward(msg) — RPC 6a. push over Bob's socket 7. delivered ack 8. update "delivered" status for Alice If Bob is offline: enqueue → push notification

Persist before you deliver

The chat server writes to storage first. Only then does it ack the sender. If delivery fails, the message is still recoverable on reconnect.

One ID, one truth

A central ID generator (Snowflake, ULID, or per-chat sequence) gives every message a monotonic, sortable identifier shared by all parties.

Offline path

No socket for Bob? The message still sits in his inbox. Push service fires APNs / FCM. On next reconnect, Bob's device pulls everything newer than its cursor.

Storage 07 · partition by conversation

Storing messages at scale

Reads and writes for a chat system are skewed by conversation, not by user. A single chat is the unit of locality — partition there and life gets easier.

Why a KV store, not RDBMS

  • Write-heavy: 10× more writes than reads. RDBMS becomes the bottleneck.
  • Append-only: messages are never updated, only inserted. No transactions across rows.
  • Recent bias: 99% of reads touch the last few days. Cold data can move to cheaper storage tiers.
  • Horizontal scale: Cassandra, DynamoDB, HBase shard naturally by partition key.

Schema

  • Partition keychat_id (clusters all messages of one conversation onto one node).
  • Sort keymessage_id (Snowflake: time + worker + seq).
  • Value — sender, content, content_type, server_ts, attachments.

Monotonic IDs

Per-conversation, IDs must increase. A centralised Snowflake gives global monotonicity with no coordination on the hot path. The first 41 bits encode time; ordering by ID equals ordering by time.

node-1 chat_id = 1042 msg_id 9001 alice → "hi bob" msg_id 9002 bob → "hey!" msg_id 9003 alice → "lunch?" msg_id 9004 bob → "1pm" ... msg_id 9421 alice → "see you" msg_id 9422 bob → "bye" node-2 chat_id = 2099 (group) msg_id 7711 eve → "kickoff" msg_id 7712 liu → "+1" msg_id 7713 mia → "agenda?" msg_id 7714 sam → "shared" node-3 chat_id = 3304 msg_id 5001 ravi → "ping" msg_id 5002 zoe → "back?" partition by chat_id · sort by message_id within partition
Group Chat 08 · server-side fan-out

From one sender to many readers

In a group, the sender publishes once. The chat service writes the message once to the conversation log, then enqueues it into a per-recipient inbox so each member can fetch and ack independently.

Alice "team huddle 5pm" Chat Service write + fan-out Conversation Log chat_id = 2099 · one row PER-RECIPIENT INBOX (TTL until ack) inbox[bob] + msg_id 7715 inbox[eve] + msg_id 7715 inbox[liu] + msg_id 7715 inbox[mia] + msg_id 7715 Bob (online) Eve (online) Liu (offline) → push notif Mia (online) 1. write once 2. fan-out — 1 inbox per member

Why per-recipient inbox

Each member acks independently and at their own pace. Bob may read instantly; Liu reads tomorrow. Per-recipient state keeps "unread" counts trivial and supports clean retries.

Bounded group size

This fan-out works up to a few hundred members. Above that — channels, broadcast rooms — flip the model: members pull from a shared log rather than each receiving a copy.

Single source of order

The conversation log assigns the canonical message_id. Every inbox copy references it, so even if devices receive out of order, the client can sort and dedupe by ID.

Presence 09 · heartbeats and TTL

Online presence — easy to describe, hard to scale

Presence looks trivial: "is this user online?" But every connected device sends a heartbeat every few seconds, and every contact wants the answer in real time. At a billion users, that is the highest QPS surface in the system.

BOB'S DEVICE — HEARTBEAT TIMELINE HBt = 0 HBt = 10s HBt = 20s HBt = 30s no heartbeat — TTL ticking TTL expiryt = 60s HB (reconnect)t = 75s PRESENCE STATE online offline (last_seen = 30s ago) online again REDIS — KEYS WITH TTL SET presence:bob "online" EX 45 SET presence:bob "online" EX 45 // refreshed SET presence:bob "online" EX 45 // refreshed // key expires after 45s of silence Subscribers (Bob's contacts) notified via pub/sub when state changes → scope: only friends/recent chats, not the whole graph → batch updates every few seconds, never per-event

Heartbeat + TTL

Client sends a ping every N seconds. Server writes presence:userId with a TTL of 2–3× the interval. If the key expires, the user is implicitly offline.

Read fan-out is the cost

A user with 500 contacts means 500 reads per second just to render online dots. Solve with caching and by only computing presence for currently-visible contacts.

State change, not state

Don't push presence continuously — push only on transitions (online → offline). Subscribers maintain their own cached view and rely on diffs.

Signals 10 · the chrome around the message

Read receipts, typing, delivery status

These ephemeral signals make a chat feel alive. They are cheap individually and brutal in aggregate — design them as best-effort and avoid storing them as first-class messages.

Delivery status

Three checkmark states:

  • Sent — server has durably stored the message and acked the sender.
  • Delivered — at least one recipient device acknowledged receipt over its socket.
  • Read — the recipient's UI opened the conversation and acked the message_id.

Each ack travels over the same WebSocket and updates a small key, not the message row.

Typing indicator

Pure transient signal. Client sends typing-start, server fan-outs to the chat's connected members, then auto-clears after 5 s if no typing-stop arrives. Never persisted. Drop on overload.

Read receipts

Read state is per-user, per-chat — a single cursor "highest read message_id". Storing one row per (user, chat) is far cheaper than per (user, message). Group chats display the count of members above the cursor.

Idempotency at every step

Network retries are the rule, not the exception. Each ack carries the message_id; the server treats duplicate acks as no-ops. The client's outbound queue uses a client-generated client_msg_id to dedupe sends across reconnects.

Rule of thumb

If losing the signal would be merely annoying — typing, presence flicker — make it best-effort. If losing it would be wrong — delivery status, ordering — make it durable.

Push 11 · waking up the offline

Push notifications for offline users

When no socket is alive, the app is asleep — possibly killed by the OS. The only way to reach it is through the platform push gateway. The push service is a queue, a worker pool, and a careful authority on what to send.

Chat Service no socket for Bob Push Queue Kafka / SQS Push Workers build payload + token APNs (iOS) FCM (Android) Web Push DECOUPLED PIPELINE chat service is fast and never blocks on a third-party gateway enqueue async durable buffer batch + retry Cancel-on-read If Bob's phone reconnects and reads the message before APNs delivers, the push worker checks delivery state and silently drops the notification.

Device tokens

Each device registers a push token with the chat service. Tokens expire — workers must handle 410 / unregistered responses and purge dead tokens.

Quiet hours and bundling

Don't ring the phone for every message in a busy group. Workers consult per-user preferences and may collapse rapid bursts into a single "5 new messages" notification.

End-to-end encryption

If messages are E2E encrypted, push carries only a wake-up signal. The actual content is decrypted on-device after the app fetches the ciphertext over the chat socket.

Sync 12 · phone + laptop + web

Keeping all of a user's devices in step

A user's account isn't tied to one device. Phone, laptop, web — each connects independently, sees the same conversations, and must converge on the same view of read state and message order without coordination from the user.

Bob's state (server) phone: last_read = 9421 laptop: last_read = 9418 web: last_read = 9421 global highest_read = 9421 Phone device_id = D1 cursor: 9421 Laptop device_id = D2 cursor: 9418 (behind) Web device_id = D3 cursor: 9421 new message msg_id 9422 fan-out to all 3 read from phone ack 9422 → broadcast to laptop+web on reconnect GET msgs > cursor catch up + replay

One cursor per device

Each device persists "last message_id I've seen" per chat. On reconnect, it asks the server for everything newer. The server doesn't need to remember anything per session.

Read state is global

If you read on your phone, the badge disappears on your laptop. Reading on any device pushes the new read cursor to the other devices over their sockets.

Deliver to all, ack from any

An incoming message is pushed to every live socket the user has. "Delivered" status flips as soon as one device acks; "Read" requires an explicit read event.

Summary 13 · five principles to remember

Designing a chat system, distilled

If you only carry away five ideas from this chapter, let them be these.

01

WebSocket is the spine

One persistent, full-duplex connection per device. Everything — messages, presence, typing, receipts — flows through it. Long polling is only a fallback.

02

Persist before you deliver

The chat service writes durably first, acks the sender, then routes. Storage is the source of truth; sockets are just a fast path to the same data.

03

Partition by conversation

chat_id is the partition key, message_id (Snowflake) is the sort key. Every conversation is a self-contained log — local reads, local writes, easy to shard.

04

Fan-out where it makes sense

For small groups, copy to per-recipient inboxes. For huge rooms, flip to pull. The right model depends on the ratio of members to active readers.

05

Presence is the loud surface

Heartbeats + TTL are simple in concept and brutal in QPS. Cache aggressively, send only state transitions, and never compute presence the user can't see.

06

Plan for offline from day one

Push gateways, per-device cursors, idempotent acks, dedupe on client_msg_id. Networks fail constantly; the chat experience must not.

← → space · F fullscreen