Design a Distributed Cache

Building a low-latency key-value cache with sharding, eviction, and hot-key handling

S
System Design Sandbox··15 min read
Learn how to design a distributed cache. Covers key routing, consistent hashing, TTL expiration, LRU/LFU eviction, replication, hot key mitigation, cache stampede protection, and consistency tradeoffs.

#Introduction

The interviewer says: "Design Redis."

If the answer is only "hash the key and store it in memory," the follow-up will hurt:

What happens when a node fails? What if memory fills up? What if one key receives half the internet? What consistency guarantees does the client get?

A distributed cache is an in-memory key-value system with routing, replication, eviction, expiration, and failure handling.

Ready to practice? Try the Distributed Cache practice problem and build this system step-by-step with AI-guided feedback.

Related concepts for this design: Databases & Caching, Consistent Hashing, Hot Keys and Cache Stampedes, CAP Theorem, Sharding, and Scaling.


#Functional Requirements

1. Key-value storage with TTL

  • Clients can set, get, and delete keys
  • Values are stored in memory
  • Keys may expire automatically after a TTL

TTL is expiration. It prevents stale or unbounded data from living forever. TTL choices also affect cache stampede risk.

2. Eviction policies

  • Cache nodes evict keys when memory is full
  • Policies can include LRU, LFU, or approximations
  • Eviction should account for bytes, not just key count

Use cache eviction internals to explain how the node chooses victims under pressure.


#Non-Functional Requirements

Elastic scaling

Adding or removing nodes should not remap every key. Use consistent hashing with virtual nodes or a cluster slot map.

Hot key mitigation

One key can overload one shard. Use hot key and stampede mitigation: L1 caches, replica reads, request coalescing, and stale-while-revalidate.

Low latency

Reads and writes should usually be single-digit milliseconds or lower. Keep the data path simple and avoid cross-node coordination for normal operations.

Availability versus consistency

Many caches prefer availability and low latency over perfect consistency. Be explicit about read-after-write behavior and replica lag; this is where CAP Theorem and Consistency Patterns become practical.


#API Design

Get key

GET /v1/cache/{key}

Response:

{
  "key": "user:123",
  "value": "{...}",
  "ttlSeconds": 48,
  "version": 17
}

Set key

PUT /v1/cache/{key}

Request:

{
  "value": "{...}",
  "ttlSeconds": 300,
  "policy": "cache-aside"
}

Delete key

DELETE /v1/cache/{key}

Most real caches use a binary or text protocol rather than HTTP, but HTTP shapes are useful in interviews because they clarify operations and errors.


#High Level Design

Application Client
Client Library
Cluster Config
Cache Shard A
Cache Shard B
TTL / Eviction Index
Hot Key Replicas
Origin Database

The client library owns request routing. It fetches a cluster map from the config service, hashes the key to a shard, and sends the operation to the owning cache node.

Each cache node stores data in memory, tracks TTL and eviction metadata, and replicates hot or durable-enough keys to peers when configured. A metadata store tracks cluster membership and slot ownership.


#Detailed Design

Routing

Use consistent hashing or fixed hash slots. The client library should refresh topology when the cluster changes. This is the same resharding problem discussed in Sharding, but optimized for key-value cache lookups.

Replication

Replicate asynchronously for low latency, or synchronously when the caller needs stronger durability. For most cache use cases, async replication is acceptable.

Eviction

Each node enforces memory limits locally. LRU protects recent working sets. LFU protects stable hot keys. Approximate policies are often better at very high throughput. The lower-level mechanics are covered in Cache Eviction Internals.

Failure handling

If a cache node fails, clients remap affected keys. The database or source of truth should handle misses. A cache failure should degrade performance, not corrupt durable data.


#Common Interview Mistakes

  • Using hash(key) % N and ignoring remapping during scale-out
  • Treating cache as the source of truth
  • Confusing TTL expiration with memory-pressure eviction
  • Ignoring hot keys and cache stampedes
  • Requiring strong consistency on every cache read without explaining latency cost

See also: Cache Eviction Internals, Hot Keys and Cache Stampedes, Consistent Hashing, and Databases & Caching.