A service that reliably reaches every user across push, SMS, email, and in-app channels — at scale, on time, and only when wanted.
A notification system is really an orchestrator over a handful of channel-specific delivery contracts. Each one carries its own provider, payload limits, identity model, and cost profile.
Short alert delivered through Apple Push Notification service for iOS and Firebase Cloud Messaging for Android. Identity is a device token tied to the app install.
Text to a phone number through a carrier gateway. Highest delivery confidence but also highest per-message cost, and constrained to 160 characters per segment.
HTML or plain-text message to an address. Rich layout possible, but deliverability depends on sender reputation, DKIM, SPF, and the recipient's spam filter.
A message shown inside the app while the user is active. Stored server-side and rendered on the user's feed or bell icon. No third-party provider required.
Browser notification delivered through a service worker subscription. Looks like mobile push but routed through the browser vendor's push endpoint.
Every channel has a different failure mode, latency profile, and rate limit. Treating them uniformly inside the core system, and isolating quirks at the edges, is the central design challenge.
Push and in-app are nearly free per message. SMS costs cents. Email sits in between. A good system steers traffic to the cheapest channel that still satisfies the user's intent.
Before drawing boxes, pin down the volume, the latency budget, and the rules the system must respect on behalf of the user.
Order-of-magnitude target for a mid-size consumer app. Peaks are commonly five to ten times the daily average over a few minutes.
From the producing service calling our API to the provider acknowledging receipt. Anything slower feels broken on transactional flows.
Voice calls, real-time chat, on-device rich media rendering, and personalised ML ranking of which notifications to send. These are real products but each deserves its own design.
10M sends per day across all channels averages to roughly 115 per second, but peaks of 5K per second during a campaign are normal. The system needs to absorb that gap without losing messages or melting providers.
Producers post a logical notification once. A queue absorbs the burst. Per-channel workers fan out to providers, applying retries and tracking on the way.
Producers send a small, structured envelope. The API resolves the user, picks the channels, renders the template, and hands the result to the queue.
Callers describe what happened and who it concerns. The notification system decides how to reach them.
// POST /v1/notifications { "event": "order.shipped", "recipient_id": "u_91823", "channels": ["push", "email"], "data": { "order_id": "A-7710", "tracking_url": "https://…" }, "idempotency_key": "order-A-7710-shipped", "priority": "transactional" }
The caller does not pick wording, channels, or locale. That is the notification system's job, so copy changes and channel routing do not require re-deploying every producer.
// template: order.shipped · push · en-US { "title": "Your order is on the way", "body": "Order {{order_id}} ships today. Tap to track.", "deeplink": "app://orders/{{order_id}}" }
The API rejects requests with unknown events, missing template variables, or unverified senders synchronously. Cheap failure beats discovering broken sends in a downstream worker.
Producers run on their own schedules. Providers have their own rate limits. The queue is the shock absorber that keeps each side from breaking the other.
The API returns success the moment the message lands in the queue. Producers do not wait for the provider hop, and they do not get blocked when a provider is slow.
Channel workers pull at exactly their provider's allowed rate. Backpressure is automatic: when the worker is slow, the queue grows; when it catches up, the queue drains.
Separating push, SMS, email, and web push topics means a slow email provider never starves push delivery, and each worker fleet can be scaled and tuned independently.
All the messy realities of the outside world live in the channel workers. Each integrates with one provider and shields the rest of the system from its quirks.
| Channel | Provider | Identity | Rate-limit shape |
|---|---|---|---|
| Push iOS | APNs (HTTP/2) | Device token per install | Connection-level concurrency, no hard QPS but they expect you to back off on 429 |
| Push Android | FCM | Registration token per install | Project-level QPS, multicast helps batch many tokens into one HTTP call |
| SMS | Twilio / MessageBird | E.164 phone number | Per-sender QPS and country-specific throttles; carrier filtering is a separate problem |
| SendGrid / SES | Email address | Daily quotas tied to reputation; warm-up periods for new sending domains | |
| Web push | VAPID endpoints | Browser subscription URL | Per-endpoint, set by the browser vendor; 410 means the subscription is gone for good |
Inside each worker, a thin adapter speaks the provider's protocol. Above it sits a uniform interface that the rest of the system uses: send, cancel, status. Swapping SendGrid for SES becomes a contained refactor.
The worker enforces its own token bucket. If the budget is 300 emails per second, the worker holds traffic at 300 even when the queue contains millions. The queue absorbs the rest.
For SMS especially, having a second provider configured per region keeps deliveries flowing when the primary has an outage. The worker tries the secondary after a small number of failed attempts on the primary.
Providers fail. Networks blip. The system's job is to know which failures are worth retrying, how long to wait between tries, and what to do with the messages that never go through.
4xx errors usually mean the message itself is bad — a wrong token, an unverified sender, a malformed payload. Retrying them just wastes provider quota. Only 5xx, timeouts, and explicit 429s are retried.
Each retry waits roughly twice as long as the previous one, with a random jitter so a fleet of workers does not all re-hit a wounded provider on the same second.
After five or six tries the message lands in a dead-letter queue. Engineers get an alert if the DLQ grows beyond a small baseline, and a replay tool can re-submit messages once the underlying problem is fixed.
Distributed systems retry. Caller scripts retry. Queues redeliver. The notification system must guarantee that the user sees each logical message at most once, no matter how many times its components try.
Every notification carries a stable key chosen by the producer. For a shipped order it might be order-A-7710-shipped. The API rejects a second request with the same key as a duplicate and returns the original response.
idempotency_key with a TTL of 24–48 hours.(user_id, channel, key), to catch redeliveries from the queue itself.It should reflect the logical event, not the request. Two different services sending an "order shipped" notification for the same order must produce the same key, so the second send is correctly suppressed.
Even the most perfectly delivered notification is wrong if the user did not want it. The system enforces preferences centrally so every channel and every event respects them.
A user can disable a whole channel — "no SMS, ever" — at the account level. The API drops any send targeted at that channel before it reaches the queue.
Finer-grained controls let users keep transactional messages while muting marketing. Each event is tagged with a topic, and the topic is checked against the user's preferences at send time.
Notifications scheduled during a user's local night-time are held until morning, with an exception for high-priority transactional events the user has explicitly allowed.
A counter per (user, channel) ensures we never send more than, say, three marketing pushes in a day or one promotional email per week — regardless of how many campaigns target them.
Laws like GDPR, CAN-SPAM, and TCPA add a hard floor: explicit consent for marketing channels, a working unsubscribe link in every email, and full deletion of preference history on request. The preference service is the audit trail.
Every send emits a stream of events that flow into an analytics pipeline. Without this loop the system is operating blind — and channel routing decisions stay stuck on guesswork.
The provider has accepted the message and, for push, the device has acknowledged it. Recorded as a webhook callback or a final status check.
The user has seen the notification — tapped a push, opened an email (tracked by a tiny image pixel), or expanded an in-app card.
The user followed the call to action. This is the conversion signal that marketing and product teams actually optimise for.
A bounced email triggers cleanup of the address. A push with an "unregistered token" response removes the token from the user's device list. A consistently ignored channel earns lower priority next time.
Email opens come from a tracking pixel that many clients block, so the metric is a lower bound. Push opens are more reliable because the OS reports tap events directly through the SDK.
Realtime metrics live in a fast streaming database for on-call dashboards. The deeper analytical work — campaign A/B tests, cohort retention — runs in the warehouse on the same event stream.
Five ideas that keep the system reliable as it scales, and respectful as it grows louder.
The queue is the heart of the design. Producers stay fast, providers stay within their limits, and the system survives spikes that neither end could absorb alone.
Channel workers are the only places that know about APNs payload sizes, Twilio carrier filters, or SES warm-up rules. Everything inside speaks one clean shape.
Exponential backoff with jitter for 5xx and timeouts. A DLQ with tooling and alerts for anything that cannot be recovered automatically. Never silently drop a message.
Make the producer pick a stable key per logical event. Check it at the API and again at the worker. The user sees the message at most once even when the system retries five times.
Opt-outs, topic mutes, quiet hours, and frequency caps live in the request path, not in afterthought filters. The right notification at the wrong time is still a wrong notification.
Every delivery emits an event. Those events power dashboards, frequency caps, token cleanup, and the routing decisions that pick the best channel for next time.