Hot Keys and Cache Stampedes

#Introduction

One key becomes popular.

It might be a viral post, a celebrity profile, a contest leaderboard, or a product launch page. Suddenly every request hits the same cache node.

If the key expires at the same time, every request may also hit the database.

That is the hot key and cache stampede problem.

This article is part of the distributed-cache cluster of concepts: Databases & Caching explains the basic cache-aside pattern, Cache Eviction Internals explains memory pressure, and Consistent Hashing explains key ownership. The full applied problem is Design a Distributed Cache.

#Hot Keys

A hot key is a key whose traffic is far above the average shard load.

Consistent hashing spreads keys across nodes, but it does not split one key. If tweet:123 receives 500,000 reads per second, the node that owns tweet:123 receives that traffic.

Mitigations include:

local L1 cache in application workers
replica reads for hot keys
key splitting for read-only values
CDN or edge cache when values are public
request rate limiting for expensive keys

Hot keys should be detected through per-key or sampled telemetry. You cannot mitigate what you cannot see. This is also why sharding discussions should include skew, not just average distribution.

#Cache Stampede

A stampede happens when many requests miss the cache at once.

Common causes:

popular key expires
cache node restarts
deploy clears all local caches
upstream dependency becomes slow

Without protection:

10,000 requests -> cache miss -> 10,000 database reads

With request coalescing:

10,000 requests -> one refresh -> 9,999 wait or serve stale

#Mitigation Patterns

The standard toolbox:

singleflight/request coalescing
stale-while-revalidate
probabilistic early refresh
TTL jitter
negative caching for missing keys
circuit breakers around the database
hot key replication
client-side L1 cache

Stale-while-revalidate is often the best interview answer for read-heavy systems. Serve the last known value briefly after expiry while one worker refreshes it in the background.

TTL jitter prevents thousands of related keys from expiring at the same second. Stale-while-revalidate is often the cache equivalent of the stale snapshot strategy in Search Serving Architecture.

#Common Interview Mistakes

Mistake 1: Assuming sharding fixes one hot key.

Sharding spreads many keys. It does not split traffic for a single key unless you add replication or key splitting.

Mistake 2: Letting every miss refresh.

Use request coalescing so one caller refreshes and the rest wait or receive stale data.

Mistake 3: Using identical TTLs for bulk-loaded data.

Identical TTLs create synchronized expiry waves.

Mistake 4: Forgetting negative caching.

Repeated misses for nonexistent keys can overload the database too.

#Summary: What to Remember

Hot keys overload one shard. Stampedes overload the source of truth.

Use L1 caches, hot key replication, request coalescing, stale-while-revalidate, TTL jitter, and negative caching. In interviews, explicitly separate "many keys unevenly distributed" from "one key receiving too much traffic."