Chapter 10 · System Design

Design a Notification System

A service that reliably reaches every user across push, SMS, email, and in-app channels — at scale, on time, and only when wanted.

A Reader's Guide
01 / 12
02 / 12
The Surfaces

Five channels, five contracts

A notification system is really an orchestrator over a handful of channel-specific delivery contracts. Each one carries its own provider, payload limits, identity model, and cost profile.

Mobile push

APNs · FCM

Short alert delivered through Apple Push Notification service for iOS and Firebase Cloud Messaging for Android. Identity is a device token tied to the app install.

~4 KB payload

SMS

Twilio · MessageBird

Text to a phone number through a carrier gateway. Highest delivery confidence but also highest per-message cost, and constrained to 160 characters per segment.

Carrier billed

In-app

Your own backend

A message shown inside the app while the user is active. Stored server-side and rendered on the user's feed or bell icon. No third-party provider required.

Read state tracked

Web push

Web Push · VAPID

Browser notification delivered through a service worker subscription. Looks like mobile push but routed through the browser vendor's push endpoint.

Browser-gated

Why this matters

Every channel has a different failure mode, latency profile, and rate limit. Treating them uniformly inside the core system, and isolating quirks at the edges, is the central design challenge.

Cost asymmetry

Push and in-app are nearly free per message. SMS costs cents. Email sits in between. A good system steers traffic to the cheapest channel that still satisfies the user's intent.

03 / 12
Scoping

What the system must do

Before drawing boxes, pin down the volume, the latency budget, and the rules the system must respect on behalf of the user.

Functional

  • Accept notification requests from many internal services through a single API.
  • Deliver across push, SMS, email, in-app, and web push from one logical request.
  • Let users opt out per channel and per topic, and honour those preferences globally.
  • Support both transactional sends (one user, now) and batched campaigns (many users, scheduled).
  • Render messages from server-side templates so non-engineers can edit copy.

Non-functional

  • Handle bursty traffic — product launches and incidents create huge spikes over a calm baseline.
  • End-to-end latency under a few seconds for transactional sends; minutes is fine for campaigns.
  • No duplicate deliveries for the same logical event, even when upstream retries.
  • Survive partial outages of any single provider without dropping messages.
10Mpush / day

Order-of-magnitude target for a mid-size consumer app. Peaks are commonly five to ten times the daily average over a few minutes.

≤ 2sP95 latency

From the producing service calling our API to the provider acknowledging receipt. Anything slower feels broken on transactional flows.

Out of scope (for now)

Voice calls, real-time chat, on-device rich media rendering, and personalised ML ranking of which notifications to send. These are real products but each deserves its own design.

Capacity sketch

10M sends per day across all channels averages to roughly 115 per second, but peaks of 5K per second during a campaign are normal. The system needs to absorb that gap without losing messages or melting providers.

04 / 12
Architecture

A single front door, many delivery lanes

Producers post a logical notification once. A queue absorbs the burst. Per-channel workers fan out to providers, applying retries and tracking on the way.

Order service Auth service Marketing tool Scheduler (cron) PRODUCERS Notification API auth · validate · render Templates User preferences Message queue (Kafka / SQS) topic-per-channel Push worker SMS worker Email worker In-app worker Web-push worker APNs FCM Twilio SMS gateway SendGrid · SES SMTP relay Feed store Cassandra Browser push VAPID endpoint Event tracking delivered · opened · clicked webhooks & worker emits
End-to-end flow: producers post once to the API, the queue absorbs the burst, channel workers translate to provider calls, and all delivery events stream back to a tracking store.
05 / 12
Interface

One API, server-rendered templates

Producers send a small, structured envelope. The API resolves the user, picks the channels, renders the template, and hands the result to the queue.

Request envelope

Callers describe what happened and who it concerns. The notification system decides how to reach them.

// POST /v1/notifications
{
  "event":        "order.shipped",
  "recipient_id": "u_91823",
  "channels":      ["push", "email"],
  "data": {
    "order_id":     "A-7710",
    "tracking_url": "https://…"
  },
  "idempotency_key": "order-A-7710-shipped",
  "priority":        "transactional"
}

Why a logical event, not a rendered message

The caller does not pick wording, channels, or locale. That is the notification system's job, so copy changes and channel routing do not require re-deploying every producer.

Template resolution

  • Each event maps to a set of templates, one per channel and locale.
  • Templates are stored centrally and versioned, with a staging step before production.
  • Rendering happens inside the notification service, never on the device, so copy can change without an app release.
  • Variables come from the data field plus a lookup of profile attributes (name, locale, time zone).
// template: order.shipped · push · en-US
{
  "title": "Your order is on the way",
  "body":  "Order {{order_id}} ships today. Tap to track.",
  "deeplink": "app://orders/{{order_id}}"
}

Validation up front

The API rejects requests with unknown events, missing template variables, or unverified senders synchronously. Cheap failure beats discovering broken sends in a downstream worker.

06 / 12
Decoupling

A queue between intent and delivery

Producers run on their own schedules. Providers have their own rate limits. The queue is the shock absorber that keeps each side from breaking the other.

PRODUCER RATE Notification topic CONSUMER RATE spiky, bursty buffer + reorder by partition steady, rate-limited PARTITIONED BY USER_ID • Per-user ordering preserved across the same channel. • Each partition consumed by exactly one worker at a time. • Scale fan-out by adding partitions and workers in lock-step.
A bursty input meets a steady output. The queue holds the difference so neither end has to flex unnaturally.

Producer-side benefit

The API returns success the moment the message lands in the queue. Producers do not wait for the provider hop, and they do not get blocked when a provider is slow.

Consumer-side benefit

Channel workers pull at exactly their provider's allowed rate. Backpressure is automatic: when the worker is slow, the queue grows; when it catches up, the queue drains.

One topic per channel

Separating push, SMS, email, and web push topics means a slow email provider never starves push delivery, and each worker fleet can be scaled and tuned independently.

07 / 12
The Edges

Providers: each one is its own special snowflake

All the messy realities of the outside world live in the channel workers. Each integrates with one provider and shields the rest of the system from its quirks.

Channel Provider Identity Rate-limit shape
Push iOS APNs (HTTP/2) Device token per install Connection-level concurrency, no hard QPS but they expect you to back off on 429
Push Android FCM Registration token per install Project-level QPS, multicast helps batch many tokens into one HTTP call
SMS Twilio / MessageBird E.164 phone number Per-sender QPS and country-specific throttles; carrier filtering is a separate problem
Email SendGrid / SES Email address Daily quotas tied to reputation; warm-up periods for new sending domains
Web push VAPID endpoints Browser subscription URL Per-endpoint, set by the browser vendor; 410 means the subscription is gone for good

Adapter pattern

Inside each worker, a thin adapter speaks the provider's protocol. Above it sits a uniform interface that the rest of the system uses: send, cancel, status. Swapping SendGrid for SES becomes a contained refactor.

Per-provider rate buckets

The worker enforces its own token bucket. If the budget is 300 emails per second, the worker holds traffic at 300 even when the queue contains millions. The queue absorbs the rest.

Failover providers

For SMS especially, having a second provider configured per region keeps deliveries flowing when the primary has an outage. The worker tries the secondary after a small number of failed attempts on the primary.

08 / 12
Resilience

Retries, backoff, and a dead-letter queue

Providers fail. Networks blip. The system's job is to know which failures are worth retrying, how long to wait between tries, and what to do with the messages that never go through.

Channel queue push topic Worker attempt n APNs / FCM 2xx · 4xx · 5xx 2xx → done 4xx invalid token → drop & clean up 5xx / 429 / timeout → retry re-enqueue with delay: 1s · 4s · 16s · 64s Dead-letter queue after N attempts Inspect & replay on-call tooling
Retries handle transient failure. The DLQ catches the rest so they can be inspected — not silently lost.

Classify before retrying

4xx errors usually mean the message itself is bad — a wrong token, an unverified sender, a malformed payload. Retrying them just wastes provider quota. Only 5xx, timeouts, and explicit 429s are retried.

Exponential backoff with jitter

Each retry waits roughly twice as long as the previous one, with a random jitter so a fleet of workers does not all re-hit a wounded provider on the same second.

Bounded attempts

After five or six tries the message lands in a dead-letter queue. Engineers get an alert if the DLQ grows beyond a small baseline, and a replay tool can re-submit messages once the underlying problem is fixed.

09 / 12
Exactly Once-ish

Deduplication & idempotency

Distributed systems retry. Caller scripts retry. Queues redeliver. The notification system must guarantee that the user sees each logical message at most once, no matter how many times its components try.

The idempotency key

Every notification carries a stable key chosen by the producer. For a shipped order it might be order-A-7710-shipped. The API rejects a second request with the same key as a duplicate and returns the original response.

Two checks, two layers

  • At the API: a fast Redis lookup keyed by idempotency_key with a TTL of 24–48 hours.
  • At the worker: a second check just before calling the provider, keyed by (user_id, channel, key), to catch redeliveries from the queue itself.

What the key should encode

It should reflect the logical event, not the request. Two different services sending an "order shipped" notification for the same order must produce the same key, so the second send is correctly suppressed.

Try #1 · key=A-7710 Try #2 · key=A-7710 Notification API dedup check SET NX key 48h → Redis ✓ accepted ↺ already sent Channel queue may redeliver Worker 2nd dedup check Provider exactly one call DEDUP AT TWO LAYERS → AT-MOST-ONCE PER (USER, CHANNEL, EVENT)
10 / 12
Consent

User preferences are not optional

Even the most perfectly delivered notification is wrong if the user did not want it. The system enforces preferences centrally so every channel and every event respects them.

Channel opt-outs

A user can disable a whole channel — "no SMS, ever" — at the account level. The API drops any send targeted at that channel before it reaches the queue.

Per-topic toggles

Finer-grained controls let users keep transactional messages while muting marketing. Each event is tagged with a topic, and the topic is checked against the user's preferences at send time.

Quiet hours

Notifications scheduled during a user's local night-time are held until morning, with an exception for high-priority transactional events the user has explicitly allowed.

Frequency caps

A counter per (user, channel) ensures we never send more than, say, three marketing pushes in a day or one promotional email per week — regardless of how many campaigns target them.

Regulatory floor

Laws like GDPR, CAN-SPAM, and TCPA add a hard floor: explicit consent for marketing channels, a working unsubscribe link in every email, and full deletion of preference history on request. The preference service is the audit trail.

Notification API incoming request 1 · Channel opt-out? drop if user has disabled this channel 2 · Topic muted? marketing vs. transactional split 3 · Quiet hours? defer until user's local morning 4 · Frequency cap? skip if daily / weekly limit hit Enqueue for delivery
Four gates, evaluated in order. Any failure drops or defers the message before it reaches the queue.
11 / 12
Feedback Loop

Tracking: did it land, did it work?

Every send emits a stream of events that flow into an analytics pipeline. Without this loop the system is operating blind — and channel routing decisions stay stuck on guesswork.

Delivered

The provider has accepted the message and, for push, the device has acknowledged it. Recorded as a webhook callback or a final status check.

Opened

The user has seen the notification — tapped a push, opened an email (tracked by a tiny image pixel), or expanded an in-app card.

Clicked

The user followed the call to action. This is the conversion signal that marketing and product teams actually optimise for.

Worker emits Provider webhook delivered / bounced Client SDK opened / clicked Events topic notification.events Realtime dashboards delivery health, alerts Warehouse BigQuery / Snowflake Frequency-cap store counters per (user, channel) Token cleanup job Routing decisions which channel wins
Every step of the funnel emits an event. The same pipeline feeds dashboards, the warehouse, and the counters that drive frequency caps.

Closing the loop

A bounced email triggers cleanup of the address. A push with an "unregistered token" response removes the token from the user's device list. A consistently ignored channel earns lower priority next time.

Pixel and webhook reality

Email opens come from a tracking pixel that many clients block, so the metric is a lower bound. Push opens are more reliable because the OS reports tap events directly through the SDK.

Two stores, two jobs

Realtime metrics live in a fast streaming database for on-call dashboards. The deeper analytical work — campaign A/B tests, cohort retention — runs in the warehouse on the same event stream.

12 / 12
Takeaways

Principles for a notification system

Five ideas that keep the system reliable as it scales, and respectful as it grows louder.

01

Decouple producers from providers with a queue

The queue is the heart of the design. Producers stay fast, providers stay within their limits, and the system survives spikes that neither end could absorb alone.

02

Push provider quirks to the edges

Channel workers are the only places that know about APNs payload sizes, Twilio carrier filters, or SES warm-up rules. Everything inside speaks one clean shape.

03

Retry the transient, dead-letter the rest

Exponential backoff with jitter for 5xx and timeouts. A DLQ with tooling and alerts for anything that cannot be recovered automatically. Never silently drop a message.

04

Idempotency keys end at-least-once delivery

Make the producer pick a stable key per logical event. Check it at the API and again at the worker. The user sees the message at most once even when the system retries five times.

05

Preferences are a first-class gate

Opt-outs, topic mutes, quiet hours, and frequency caps live in the request path, not in afterthought filters. The right notification at the wrong time is still a wrong notification.

06

Close the loop with tracking

Every delivery emits an event. Those events power dashboards, frequency caps, token cleanup, and the routing decisions that pick the best channel for next time.

01 / 12
← → arrows · space · F for fullscreen