Hot Keys and Cache Stampedes

Protecting caches and databases from viral keys and synchronized misses

S
System Design Sandbox··9 min read
Learn how high-scale systems handle hot keys and cache stampedes. Covers hot key detection, L1 caches, replica reads, request coalescing, stale-while-revalidate, TTL jitter, negative caching, and thundering herd protection.

#Introduction

One key becomes popular.

It might be a viral post, a celebrity profile, a contest leaderboard, or a product launch page. Suddenly every request hits the same cache node.

If the key expires at the same time, every request may also hit the database.

That is the hot key and cache stampede problem.

This article is part of the distributed-cache cluster of concepts: Databases & Caching explains the basic cache-aside pattern, Cache Eviction Internals explains memory pressure, and Consistent Hashing explains key ownership. The full applied problem is Design a Distributed Cache.


#Hot Keys

A hot key is a key whose traffic is far above the average shard load.

Consistent hashing spreads keys across nodes, but it does not split one key. If tweet:123 receives 500,000 reads per second, the node that owns tweet:123 receives that traffic.

Mitigations include:

  • local L1 cache in application workers
  • replica reads for hot keys
  • key splitting for read-only values
  • CDN or edge cache when values are public
  • request rate limiting for expensive keys

Hot keys should be detected through per-key or sampled telemetry. You cannot mitigate what you cannot see. This is also why sharding discussions should include skew, not just average distribution.


#Cache Stampede

A stampede happens when many requests miss the cache at once.

Common causes:

  • popular key expires
  • cache node restarts
  • deploy clears all local caches
  • upstream dependency becomes slow

Without protection:

10,000 requests -> cache miss -> 10,000 database reads

With request coalescing:

10,000 requests -> one refresh -> 9,999 wait or serve stale
Many Clients
L1 Cache
Request Coalescer
Hot Key Replicas
Origin Store

#Mitigation Patterns

The standard toolbox:

  • singleflight/request coalescing
  • stale-while-revalidate
  • probabilistic early refresh
  • TTL jitter
  • negative caching for missing keys
  • circuit breakers around the database
  • hot key replication
  • client-side L1 cache

Stale-while-revalidate is often the best interview answer for read-heavy systems. Serve the last known value briefly after expiry while one worker refreshes it in the background.

TTL jitter prevents thousands of related keys from expiring at the same second. Stale-while-revalidate is often the cache equivalent of the stale snapshot strategy in Search Serving Architecture.


#Common Interview Mistakes

Mistake 1: Assuming sharding fixes one hot key.

Sharding spreads many keys. It does not split traffic for a single key unless you add replication or key splitting.

Mistake 2: Letting every miss refresh.

Use request coalescing so one caller refreshes and the rest wait or receive stale data.

Mistake 3: Using identical TTLs for bulk-loaded data.

Identical TTLs create synchronized expiry waves.

Mistake 4: Forgetting negative caching.

Repeated misses for nonexistent keys can overload the database too.


#Summary: What to Remember

Hot keys overload one shard. Stampedes overload the source of truth.

Use L1 caches, hot key replication, request coalescing, stale-while-revalidate, TTL jitter, and negative caching. In interviews, explicitly separate "many keys unevenly distributed" from "one key receiving too much traffic."

Related articles: Cache Eviction Internals, Databases & Caching, Sharding, Consistent Hashing, Search Serving Architecture, and Design a Distributed Cache.