Operational Transform vs CRDTs

#Introduction

You are designing Google Docs. Two users place their cursor after H in the word Hello.

Alice types i. Bob types ey. Both clients send their edit at almost the same time:

Base document: H ello

Alice operation: insert "i" at position 1
Bob operation:   insert "ey" at position 1

If the server just applies operations in arrival order, one client may show Hieyello while another shows Heyiello. The network delivered the same intent in different orders, and now users are looking at different documents.

That is the problem Operational Transform and CRDTs solve. WebSockets move the operations quickly. OT and CRDTs make sure the operations converge.

#The Conflict Example

Collaborative editing is not normal CRUD.

In CRUD, the client often sends a new final value:

{ "document": "Hello world" }

That fails for live editing because every user is editing a slightly different local version. Instead, clients send operations:

{
  "baseVersion": 481,
  "clientOperationId": "op_9",
  "operation": {
    "type": "insert",
    "position": 1,
    "text": "i"
  }
}

The important fields are:

baseVersion: the document version the client edited
clientOperationId: an idempotency key for retries
operation: the user's intent, not the whole document

When two operations are based on the same version, the system has to merge them. "Last write wins" is not acceptable because it drops someone's text.

#Operational Transform

Operational Transform (OT) keeps a central sequence of operations. When a stale operation arrives, the server transforms it against operations that already committed.

Example:

Base:     Hello

Alice: insert "i" at position 1
Bob:   insert "ey" at position 1

Server receives Alice first:
  v482 = H i ello

Bob's operation was based on v481.
Server transforms Bob's position from 1 to 2:
  v483 = H i ey ello

The exact tie-breaking rule can be different. The key is that it is deterministic. If Alice's operation wins the tie, Bob's insert moves after Alice's insert. If Bob wins, Alice moves after Bob. Either way, everyone converges.

OT works well when there is a server-authoritative ordering point:

browser-based document editors
code editors with a central collaboration service
systems where the server validates permissions before accepting edits

The server usually stores an operation log and periodic snapshots. Reconnecting clients load a snapshot, replay missing operations, and rejoin the live stream.

#CRDTs

CRDT stands for Conflict-free Replicated Data Type.

Instead of transforming operations through one central sequencer, a CRDT gives every operation enough metadata to merge in any order. That is useful when clients need to edit offline, sync peer-to-peer, or accept operations from multiple regions.

A simple text CRDT gives every inserted character a stable position identifier:

H        e        l        l        o
id:1     id:4     id:5     id:6     id:7

Alice inserts "i" between id:1 and id:4 -> id:2A
Bob inserts "ey" between id:1 and id:4 -> id:2B, id:3B

The CRDT defines a deterministic ordering for ids between the same neighbors. Every replica can receive Alice then Bob, or Bob then Alice, and still sort the characters the same way.

CRDTs are strongest when:

users can edit offline
updates may sync through multiple paths
there is no single reliable central sequencer
availability matters more than simple storage

The tradeoff is metadata. CRDT operations often carry more identifiers, tombstones, or causal metadata than OT operations. Text CRDTs can also need compaction to keep documents from growing forever.

#Choosing Between OT and CRDT

Use OT when the product already depends on a central collaboration service.

That gives you:

one authoritative operation order
simpler permission checks
easier audit logs
smaller operation metadata

Use CRDTs when offline or multi-master editing is a first-class requirement.

That gives you:

local edits without waiting for the server
mergeable updates from different replicas
better tolerance for disconnected clients

The wrong answer is picking one because it sounds more modern. The right answer starts with the product requirement.

Requirement	Better fit
Google Docs-style online editing with central service	OT
Figma-style sync with rich local state and offline tolerance	CRDT or hybrid
Need strict server validation before edits commit	OT
Need peer-to-peer or multi-region merge without a single sequencer	CRDT

Many production systems are hybrids. They may use server sequencing for document text, CRDT-like structures for comments or cursors, and regular database transactions for permissions.

#Common Interview Mistakes

Mistake 1: Saying "use WebSockets" and stopping.

WebSockets solve transport. They do not solve concurrent edits.

Mistake 2: Sending the entire document on every keystroke.

That creates huge payloads and makes conflict resolution harder. Send operations.

Mistake 3: Using last-write-wins for text.

Last-write-wins drops valid edits. It may be fine for a profile photo. It is not fine for a paragraph.

Mistake 4: Ignoring retries.

Clients retry operations after reconnect. Use client operation ids so the same keystroke is not applied twice.

Mistake 5: Forgetting compaction.

Operation logs and CRDT metadata grow. Store snapshots and compact old history.

#Summary: What to Remember

Collaborative editing needs convergence, not just realtime transport.

Operational Transform transforms stale operations against already accepted operations. It fits server-authoritative editors where the collaboration service owns ordering.

CRDTs make operations mergeable in any order. They fit offline, peer-to-peer, and multi-master collaboration, but often carry more metadata.

In interviews, explain the conflict example, name the merge strategy, describe operation logs and snapshots, and call out idempotency for retries. That is the difference between "live CRUD" and a real collaborative editor.