#Introduction
You are designing Google Docs. Two users place their cursor after H in the word Hello.
Alice types i. Bob types ey. Both clients send their edit at almost the same time:
Base document: H ello
Alice operation: insert "i" at position 1
Bob operation: insert "ey" at position 1
If the server just applies operations in arrival order, one client may show Hieyello while another shows Heyiello. The network delivered the same intent in different orders, and now users are looking at different documents.
That is the problem Operational Transform and CRDTs solve. WebSockets move the operations quickly. OT and CRDTs make sure the operations converge.
#The Conflict Example
Collaborative editing is not normal CRUD.
In CRUD, the client often sends a new final value:
{ "document": "Hello world" }
That fails for live editing because every user is editing a slightly different local version. Instead, clients send operations:
{
"baseVersion": 481,
"clientOperationId": "op_9",
"operation": {
"type": "insert",
"position": 1,
"text": "i"
}
}
The important fields are:
baseVersion: the document version the client editedclientOperationId: an idempotency key for retriesoperation: the user's intent, not the whole document
When two operations are based on the same version, the system has to merge them. "Last write wins" is not acceptable because it drops someone's text.
#Operational Transform
Operational Transform (OT) keeps a central sequence of operations. When a stale operation arrives, the server transforms it against operations that already committed.
Example:
Base: Hello
Alice: insert "i" at position 1
Bob: insert "ey" at position 1
Server receives Alice first:
v482 = H i ello
Bob's operation was based on v481.
Server transforms Bob's position from 1 to 2:
v483 = H i ey ello
The exact tie-breaking rule can be different. The key is that it is deterministic. If Alice's operation wins the tie, Bob's insert moves after Alice's insert. If Bob wins, Alice moves after Bob. Either way, everyone converges.
OT works well when there is a server-authoritative ordering point:
- browser-based document editors
- code editors with a central collaboration service
- systems where the server validates permissions before accepting edits
The server usually stores an operation log and periodic snapshots. Reconnecting clients load a snapshot, replay missing operations, and rejoin the live stream.
#CRDTs
CRDT stands for Conflict-free Replicated Data Type.
Instead of transforming operations through one central sequencer, a CRDT gives every operation enough metadata to merge in any order. That is useful when clients need to edit offline, sync peer-to-peer, or accept operations from multiple regions.
A simple text CRDT gives every inserted character a stable position identifier:
H e l l o
id:1 id:4 id:5 id:6 id:7
Alice inserts "i" between id:1 and id:4 -> id:2A
Bob inserts "ey" between id:1 and id:4 -> id:2B, id:3B
The CRDT defines a deterministic ordering for ids between the same neighbors. Every replica can receive Alice then Bob, or Bob then Alice, and still sort the characters the same way.
CRDTs are strongest when:
- users can edit offline
- updates may sync through multiple paths
- there is no single reliable central sequencer
- availability matters more than simple storage
The tradeoff is metadata. CRDT operations often carry more identifiers, tombstones, or causal metadata than OT operations. Text CRDTs can also need compaction to keep documents from growing forever.
#Choosing Between OT and CRDT
Use OT when the product already depends on a central collaboration service.
That gives you:
- one authoritative operation order
- simpler permission checks
- easier audit logs
- smaller operation metadata
Use CRDTs when offline or multi-master editing is a first-class requirement.
That gives you:
- local edits without waiting for the server
- mergeable updates from different replicas
- better tolerance for disconnected clients
The wrong answer is picking one because it sounds more modern. The right answer starts with the product requirement.
| Requirement | Better fit |
|---|---|
| Google Docs-style online editing with central service | OT |
| Figma-style sync with rich local state and offline tolerance | CRDT or hybrid |
| Need strict server validation before edits commit | OT |
| Need peer-to-peer or multi-region merge without a single sequencer | CRDT |
Many production systems are hybrids. They may use server sequencing for document text, CRDT-like structures for comments or cursors, and regular database transactions for permissions.
#Common Interview Mistakes
Mistake 1: Saying "use WebSockets" and stopping.
WebSockets solve transport. They do not solve concurrent edits.
Mistake 2: Sending the entire document on every keystroke.
That creates huge payloads and makes conflict resolution harder. Send operations.
Mistake 3: Using last-write-wins for text.
Last-write-wins drops valid edits. It may be fine for a profile photo. It is not fine for a paragraph.
Mistake 4: Ignoring retries.
Clients retry operations after reconnect. Use client operation ids so the same keystroke is not applied twice.
Mistake 5: Forgetting compaction.
Operation logs and CRDT metadata grow. Store snapshots and compact old history.
#Summary: What to Remember
Collaborative editing needs convergence, not just realtime transport.
Operational Transform transforms stale operations against already accepted operations. It fits server-authoritative editors where the collaboration service owns ordering.
CRDTs make operations mergeable in any order. They fit offline, peer-to-peer, and multi-master collaboration, but often carry more metadata.
In interviews, explain the conflict example, name the merge strategy, describe operation logs and snapshots, and call out idempotency for retries. That is the difference between "live CRUD" and a real collaborative editor.