#Introduction
The interviewer says: "Design Redis."
If the answer is only "hash the key and store it in memory," the follow-up will hurt:
What happens when a node fails? What if memory fills up? What if one key receives half the internet? What consistency guarantees does the client get?
A distributed cache is an in-memory key-value system with routing, replication, eviction, expiration, and failure handling.
Ready to practice? Try the Distributed Cache practice problem and build this system step-by-step with AI-guided feedback.
Related concepts for this design: Databases & Caching, Consistent Hashing, Hot Keys and Cache Stampedes, CAP Theorem, Sharding, and Scaling.
#Functional Requirements
1. Key-value storage with TTL
- Clients can set, get, and delete keys
- Values are stored in memory
- Keys may expire automatically after a TTL
TTL is expiration. It prevents stale or unbounded data from living forever. TTL choices also affect cache stampede risk.
2. Eviction policies
- Cache nodes evict keys when memory is full
- Policies can include LRU, LFU, or approximations
- Eviction should account for bytes, not just key count
Use cache eviction internals to explain how the node chooses victims under pressure.
#Non-Functional Requirements
Elastic scaling
Adding or removing nodes should not remap every key. Use consistent hashing with virtual nodes or a cluster slot map.
Hot key mitigation
One key can overload one shard. Use hot key and stampede mitigation: L1 caches, replica reads, request coalescing, and stale-while-revalidate.
Low latency
Reads and writes should usually be single-digit milliseconds or lower. Keep the data path simple and avoid cross-node coordination for normal operations.
Availability versus consistency
Many caches prefer availability and low latency over perfect consistency. Be explicit about read-after-write behavior and replica lag; this is where CAP Theorem and Consistency Patterns become practical.
#API Design
Get key
GET /v1/cache/{key}
Response:
{
"key": "user:123",
"value": "{...}",
"ttlSeconds": 48,
"version": 17
}
Set key
PUT /v1/cache/{key}
Request:
{
"value": "{...}",
"ttlSeconds": 300,
"policy": "cache-aside"
}
Delete key
DELETE /v1/cache/{key}
Most real caches use a binary or text protocol rather than HTTP, but HTTP shapes are useful in interviews because they clarify operations and errors.
#High Level Design
The client library owns request routing. It fetches a cluster map from the config service, hashes the key to a shard, and sends the operation to the owning cache node.
Each cache node stores data in memory, tracks TTL and eviction metadata, and replicates hot or durable-enough keys to peers when configured. A metadata store tracks cluster membership and slot ownership.
#Detailed Design
Routing
Use consistent hashing or fixed hash slots. The client library should refresh topology when the cluster changes. This is the same resharding problem discussed in Sharding, but optimized for key-value cache lookups.
Replication
Replicate asynchronously for low latency, or synchronously when the caller needs stronger durability. For most cache use cases, async replication is acceptable.
Eviction
Each node enforces memory limits locally. LRU protects recent working sets. LFU protects stable hot keys. Approximate policies are often better at very high throughput. The lower-level mechanics are covered in Cache Eviction Internals.
Failure handling
If a cache node fails, clients remap affected keys. The database or source of truth should handle misses. A cache failure should degrade performance, not corrupt durable data.
#Common Interview Mistakes
- Using
hash(key) % Nand ignoring remapping during scale-out - Treating cache as the source of truth
- Confusing TTL expiration with memory-pressure eviction
- Ignoring hot keys and cache stampedes
- Requiring strong consistency on every cache read without explaining latency cost
See also: Cache Eviction Internals, Hot Keys and Cache Stampedes, Consistent Hashing, and Databases & Caching.