Introduction
The interviewer says: "Design WhatsApp."
You think, "Store messages in a database, send them between users." Sounds manageable.
Then they follow up: "Alice is connected to Server A in Virginia. Bob is connected to Server B in Frankfurt. How does Alice's message reach Bob in under 200ms? What about a group chat with 200 people? How do you show who's online without hammering your database?"
And now you realize this isn't a CRUD app. It's a distributed real-time messaging system with stateful connections, fan-out challenges, and presence tracking.
Here's how to design one that actually holds up in an interview.
Functional Requirements
1. One-on-one messaging
- Users send text messages to each other in real-time
- Messages are delivered instantly when both users are online
- Messages are stored and delivered when the recipient comes back online
- This requires persistent connections because HTTP polling doesn't work at this scale. Use WebSockets.
2. Group chat
- Support groups up to 256 members
- A message sent to a group is delivered to all members
- Server-side fan-out: the client sends one message, the server distributes it to all group members
- Need to track who has received and read each message
3. Online presence (last seen)
- Show whether a user is "Online" or their "Last Seen" timestamp
- Dedicated Presence Service using Redis with TTL-based heartbeats
- When a user's heartbeat stops arriving, their TTL expires and they go "offline"
That's the core. A chat application delivers messages in real-time between users, handles offline delivery, and tracks presence.
For a deep dive on WebSockets vs HTTP and how real-time connections work, see WebSockets & Real-Time Communication.
Non-Functional Requirements
Low latency
- Messages must arrive in under 200ms for a conversational experience
- Minimize network hops between sender and receiver
- WebSocket connections eliminate HTTP overhead, with no repeated handshakes
Message ordering
- Messages must appear in the exact order they were sent
- If Alice says "Hi" then "How are you?", they can't arrive in reverse order
- Use monotonically increasing IDs (Snowflake IDs) for correct ordering
- Partition messages by chat/conversation to maintain per-conversation ordering
For more on availability vs consistency trade-offs in distributed systems, see CAP Theorem.
API Design
Chat systems use two transport protocols:
WebSocket Events (Real-time messaging)
Connection: WS /chat/connect?userId={id}&token={authToken}
Response: 101 Switching Protocols
Events:
sendMessage: { chatId, content, timestamp }
receiveMessage: { messageId, chatId, senderId, content, timestamp }
typing: { chatId, userId }
ack: { messageId, status: "delivered" | "read" }
REST API (Chat history)
GET /api/v1/chats/{chatId}/messages?cursor={timestamp}&limit=20
Response:
{
"messages": [
{
"id": "msg-1",
"content": "Hello",
"senderId": "user-1",
"timestamp": "2024-01-01T12:00:00Z",
"status": "read"
}
],
"nextCursor": "2024-01-01T11:00:00Z"
}
Status: 200 OK
Why two protocols?
- WebSockets for live messaging: instant, bidirectional, low overhead
- REST for history: cursor-based pagination, loads old messages when opening a chat
- Real-time messages go through WebSocket; historical messages are fetched via REST
Cursor-based pagination:
- Don't use offset pagination for chat history (inserting new messages shifts offsets)
- Use timestamp or message ID as cursor
- Client sends "give me 20 messages before this cursor"
High Level Design
Here's the overall architecture:
Key Components
1. Load Balancer
- Distributes incoming client connections across WebSocket Gateway instances
- Once a WebSocket connection is established, it stays on that gateway for its lifetime
- Ensures even distribution of connections across available gateways
2. WebSocket Gateway
- Manages persistent WebSocket connections
- Each gateway server holds thousands of active connections
- Stateful: Alice's connection is on a specific gateway server
- "Dumb pipe" that just manages connections and routes messages. No business logic here.
3. Chat Service
- Stateless service that handles message processing
- Receives messages from the gateway, persists them, routes to recipients
- Handles group chat fan-out: one incoming message -> N outgoing messages
- Publishes messages to Redis Pub/Sub for cross-server delivery
- Separated from the gateway so it can scale independently
4. Message Store (Cassandra)
- Wide-column NoSQL database optimized for write-heavy workloads
- Chat generates billions of small messages, and SQL databases struggle here
- Partition key:
chatId, so all messages in a conversation are co-located - Clustering key:
timestamp, keeping messages sorted chronologically within each partition - For more on SQL vs NoSQL trade-offs, see Databases & Caching.
5. Redis Pub/Sub
- Handles cross-server message routing between gateways
- Each gateway subscribes to channels for its connected users
- Chat Service publishes to the recipient's channel; the correct gateway receives and pushes to the client
- Solves the core distributed problem: Alice is on Gateway A, Bob is on Gateway B
6. Presence Service + Redis
- Tracks which users are online using heartbeat + TTL pattern
- User connects -> SET userId "online" with 60-second TTL in Redis
- Client sends heartbeat every 30 seconds -> refresh TTL
- No heartbeat for 60 seconds -> TTL expires -> user is "offline"
- Fast: checking presence = single Redis GET (~0.1ms)
Why This Architecture
Why a Load Balancer in front of the Gateway?
With multiple gateway instances, clients need a way to connect to one of them. The load balancer distributes connections evenly. Once a WebSocket is established, the connection stays on that gateway for its lifetime, so the LB only matters at connection time.
Why WebSocket Gateway is separated from Chat Service?
The gateway is stateful (holds connections), the chat service is stateless (processes messages). Separating them means you can scale each independently. You need more gateways for more connections, and more chat service instances for more message processing.
For more on horizontal vs vertical scaling patterns, see Scaling.
Why Cassandra (not PostgreSQL)?
Chat apps are extremely write-heavy. Billions of small messages per day. Cassandra handles this with distributed writes across nodes. PostgreSQL would need aggressive sharding, and single-node writes become a bottleneck. Cassandra is purpose-built for this access pattern.
Why Redis Pub/Sub for cross-server routing?
Alice is on Gateway A, Bob is on Gateway B. The Chat Service needs to get Alice's message to Bob's gateway. Redis Pub/Sub provides a lightweight publish/subscribe mechanism where each gateway subscribes to channels for its connected users. When the Chat Service publishes to Bob's channel, Gateway B picks it up and pushes to Bob's WebSocket.
Why Redis for presence (not the database)?
Presence is ephemeral. It changes constantly and doesn't need durability. If Redis crashes, presence just rebuilds as users reconnect. Storing presence in a database would add unnecessary write load for data that's stale within seconds anyway.
Detailed Design
Message Flow (1-on-1)
Alice sends "Hello" to Bob:
1. Alice's client sends message via WebSocket to Gateway A
2. Gateway A forwards to Chat Service
3. Chat Service:
a. Generate message ID (Snowflake ID for ordering)
b. Persist to Cassandra (chatId partition)
c. Look up Bob's gateway: Redis -> "Bob is on Gateway B"
d. Send message to Gateway B
4. Gateway B pushes message to Bob's WebSocket
5. Bob's client sends ACK -> Chat Service marks as "delivered"
6. Bob opens chat -> Chat Service marks as "read"
This is the critical path. Every message goes through persist-then-route. If the server crashes between step 3b and 3d, the message is safe in Cassandra and Bob gets it on reconnect.
Message Flow (Group Chat)
Alice sends "Hello" to Group-1 (200 members):
1. Alice's client sends message via WebSocket to Gateway A
2. Gateway A forwards to Chat Service
3. Chat Service:
a. Persist message once to Cassandra (chatId: Group-1)
b. Look up all group members' gateway connections
c. Fan-out: send to each member's gateway
4. Each gateway pushes to its connected members
5. Offline members: message stored in Cassandra, delivered on reconnect
Fan-out optimization:
- For small groups (< 256 members): server-side fan-out is fine
- Store the message once, deliver to each connected member
- Offline members fetch missed messages via REST API on reconnect
Cross-Server Message Routing
The key challenge: Alice is on Gateway A, Bob is on Gateway B. How does the message get across?
Option 1: Connection Registry (Redis)
userId -> gatewayServerId mapping in Redis
Chat Service looks up Bob's gateway, routes directly
Option 2: Redis Pub/Sub
Each gateway subscribes to channels for its connected users
Chat Service publishes to Bob's channel
Bob's gateway receives and pushes to Bob
Both work. Connection registry is simpler for small scale. Pub/Sub scales better with many gateways. In an interview, mention both and explain the trade-off. That's what separates strong answers from average ones.
Offline Message Handling
1. Alice sends message to Bob
2. Chat Service checks: Is Bob online? (Redis presence check)
3. Bob is offline:
a. Message persisted to Cassandra (always happens)
b. No real-time delivery attempted
4. Bob comes back online:
a. Client connects via WebSocket
b. Client sends last-received message ID
c. REST call: GET /chats/{chatId}/messages?cursor={lastMessageId}
d. Server returns all messages after that cursor
The key insight: messages are always persisted first, regardless of online status. The real-time push is an optimization, not the source of truth. Cassandra is the source of truth.
Delivery Receipts
Message statuses:
"sent" -> Server received and persisted the message
"delivered" -> Recipient's device received the message (ACK)
"read" -> Recipient opened the chat (read event)
Flow:
Alice sends -> Server stores (sent)
Server pushes to Bob -> Bob's device ACKs (delivered)
Bob opens chat -> Client sends read event (read)
Each status update is pushed back to Alice via WebSocket
This gives you the double-check-mark behavior. One check for sent, two checks for delivered, blue checks for read. The interviewer will appreciate you explaining the mechanics behind a feature they use every day.
Data Model (Cassandra)
Table: messages
Partition Key: chat_id
Clustering Key: message_id (Snowflake, time-ordered)
chat_id | message_id | sender_id | content | timestamp | status
chat-1 | 1001 | alice | "Hello" | 12:00:01 | read
chat-1 | 1002 | bob | "Hi!" | 12:00:02 | delivered
All messages for a conversation are stored together (same partition). Reading chat history is a single partition scan, which is extremely fast in Cassandra. This is exactly the access pattern Cassandra was designed for.
For a structured approach to covering these design decisions, see System Design Structure.
Common Interview Mistakes
Mistake 1: Using HTTP polling for real-time messaging
"Clients poll every second for new messages."
Problem: 10 million users polling every second = 10 million requests/second. Most return nothing. This doesn't scale.
Better: WebSocket connections. Server pushes messages instantly. Zero wasted requests.
Mistake 2: Storing messages in a SQL database
"I'll use PostgreSQL for message storage."
Problem: Chat apps generate billions of writes per day. A single PostgreSQL instance can't handle this without aggressive sharding.
Better: Cassandra or HBase. Purpose-built for write-heavy, time-series-like data with distributed writes.
Mistake 3: Ignoring the cross-server routing problem
"The WebSocket server receives the message and sends it to the recipient."
Problem: Alice and Bob are on different servers. How does Server A send to Server B?
Better: Use a connection registry in Redis or Redis Pub/Sub for cross-server message routing.
Mistake 4: Not handling offline users
"Users are always connected."
Problem: Mobile devices lose connectivity constantly. Messages sent to offline users would be lost.
Better: Always persist to Cassandra first. On reconnect, client fetches missed messages via REST using cursor-based pagination.
Mistake 5: Storing presence in the database
"I'll update a 'last_seen' column in the users table."
Problem: With 10 million active users sending heartbeats every 30 seconds, that's 330K writes/second to your user table. Your database melts.
Better: Redis with TTL. Ephemeral data belongs in ephemeral storage. If Redis crashes, presence rebuilds as users reconnect.
Interview golden rule:
Don't just say "I'll use WebSockets for chat." Explain the gateway architecture, how messages route across servers, what happens when users are offline, and why you chose Cassandra over SQL for message storage.