#Introduction
The interviewer says: "Design Google Docs."
You say the client can save the whole document every few seconds. Then the follow-up arrives: "What happens when two people type at the same cursor position? What if they are connected to different servers? How do other users see live cursors?"
A collaborative editor is a realtime state synchronization system. The hard parts are conflict resolution, low-latency fan-out, and separating durable document content from ephemeral presence state.
Ready to practice? Try the Collaborative Editor practice problem and build this system step-by-step with AI-guided feedback.
#Functional Requirements
1. Concurrent editing
- Multiple users can edit the same document at the same time
- Edits must not overwrite each other
- Every client should eventually converge to the same document state
The key decision is conflict resolution. Two common approaches are covered in more depth in Operational Transform vs CRDTs:
- Operational Transform (OT): transform operations against concurrent operations before applying them
- CRDT: choose a data type where concurrent operations can merge without central transformation
The interview answer should not stop at "use WebSockets." WebSockets move messages quickly, but they do not resolve conflicts by themselves.
2. Presence and cursors
- Users can see who is online in the document
- Users can see other participants' cursors and selections
- Cursor movement should update quickly but does not need durable storage
Presence should be treated separately from document edits. Text operations are durable. Cursor positions are ephemeral state and can expire after disconnect or heartbeat timeout.
#Non-Functional Requirements
Low edit latency
Typing should feel live. A common target is under 100-200ms for remote collaborators. Use persistent WebSocket connections, keep gateway hops short, and route users to a nearby collaboration region when possible.
High fan-out efficiency
One accepted edit may need to reach 100 active viewers. The server should publish the edit once to a document room, then fan it out to subscribed gateway instances using the same basic fan-out strategies that appear in chat and notification systems.
Convergence
Clients may receive operations in different orders during retries, reconnects, or cross-region routing. The merge algorithm and operation log must still converge.
Recovery
Store accepted operations and periodic snapshots. A reconnecting client can load a snapshot, replay missing operations, and rejoin the live stream.
#API Design
Load document snapshot
GET /api/v1/documents/doc_123
Response:
{
"documentId": "doc_123",
"content": "Hello world",
"version": 481,
"collaborators": [{ "userId": "user_1", "name": "Ava" }]
}
The client needs the current content and version before joining the realtime operation stream.
Submit edit operation
POST /api/v1/documents/doc_123/operations
Request:
{
"baseVersion": 481,
"clientOperationId": "op_client_9",
"operation": {
"type": "insert",
"position": 6,
"text": "shared "
}
}
Response:
{
"accepted": true,
"serverVersion": 482,
"transformedOperation": {
"type": "insert",
"position": 6,
"text": "shared "
}
}
In production, the same operation shape usually travels over the WebSocket session. The HTTP endpoint is still useful for API design because it forces the candidate to name the payload, version metadata, idempotency key, and error cases.
#High Level Design
The client connects through a load balancer to a WebSocket gateway. The gateway owns long-lived sockets and document room membership, but it does not own merge rules.
The collaboration service validates permissions, sequences edits, and passes concurrent operations through an OT or CRDT engine. Accepted operations are persisted to the document store and published to pub/sub so every gateway with viewers in the same document can broadcast the update.
Presence flows through a separate presence service backed by Redis TTLs. This keeps cursor movement fast and prevents high-churn cursor state from polluting the durable document log.
#Detailed Design
Operation log and snapshots
Persist every accepted operation with:
- document id
- server version
- client operation id
- author id
- operation payload
- timestamp
Periodically compact the operation log into snapshots. A client loading an old document should not replay millions of operations from the beginning.
Idempotency
Clients retry when networks fail. The server should deduplicate by (document_id, client_operation_id) so the same keystroke is not applied twice.
Document rooms
Each active document maps to a room. Gateways subscribe to that room while they have at least one connected client viewing the document.
Reconnect flow
When a client reconnects, it sends its last seen server version. The server returns missing operations or asks the client to reload a newer snapshot if the gap is too large.
#Common Interview Mistakes
- Treating live editing as normal CRUD
- Saying "use WebSockets" without explaining OT or CRDT conflict resolution
- Storing cursor movement in the primary document database
- Broadcasting through one server without explaining cross-gateway fan-out
- Ignoring reconnect, retry, and duplicate operation handling