Search Autocomplete (Typeahead)

#Introduction

The interviewer says: "Design search autocomplete."

You say to run a database LIKE 'prefix%' query on every keystroke. Then the follow-up arrives: "What if there are billions of queries, users type fast, and trending searches need to appear within minutes?"

Autocomplete is a low-latency top-k prefix lookup system. The read path should be precomputed and cache-friendly, while the write path aggregates completed searches into refreshed rankings.

Ready to practice? Try the Autocomplete practice problem and build this system step-by-step with AI-guided feedback.

#Functional Requirements

1. Prefix matching

A user types a prefix
The system returns matching suggestions
The response should usually include the top 5 suggestions

A Trie is the classic data structure. Each edge represents a character, and each node represents a prefix. If each node stores the top suggestions for that prefix, serving a request is fast.

2. Popularity ranking

Suggestions are ranked by search frequency
Ranking can be global, regional, language-specific, or personalized
Completed search events are the source of popularity counts

The serving path should not count raw query events synchronously. Query events should flow into an aggregation pipeline that refreshes top-k snapshots.

#Non-Functional Requirements

Keystroke latency

Autocomplete should return in roughly 50-100ms. Use client-side debouncing, edge caching, in-memory prefix maps, and small response payloads. The read path is a specialized example of search serving architecture.

Trending freshness

The system does not need exact counts per keystroke, but it should reflect major trends within minutes or hours. Batch refreshes are simpler. Streaming refreshes are fresher but more complex.

High QPS

Autocomplete receives a request for nearly every character typed. Hot prefixes such as "a", "new", or "weather" need aggressive caching.

Availability

If ranking refresh is delayed, serving should continue from the last known snapshot. Stale suggestions are better than an empty dropdown.

#API Design

Fetch suggestions

GET /api/v1/suggestions?prefix=new%20y&limit=5&locale=en-US&userRegion=US

Response:

{
  "suggestions": [
    { "text": "new york weather", "score": 98210, "source": "regional" },
    { "text": "new york times", "score": 77120, "source": "global" }
  ]
}

Record search event

POST /api/v1/search-events

Request:

{
  "eventId": "evt_123",
  "query": "new york weather",
  "userRegion": "US",
  "language": "en",
  "timestamp": "2026-04-22T12:00:00Z"
}

Response:

{
  "accepted": true,
  "eventId": "evt_123"
}

#High Level Design

The client debounces input and calls the closest edge/API endpoint. Hot prefixes are served from edge cache. Cache misses go to the suggestion service, which reads from a memory-resident Trie or prefix map.

Completed searches are appended to an event log. A ranking aggregator computes counts and top-k suggestions by prefix, region, language, and time window. Refreshed snapshots are written to the suggestion store and loaded into serving caches.

#Detailed Design

Trie serving

Each Trie node can store:

prefix
top 5 global suggestions
top 5 regional suggestions
last updated timestamp

This makes reads fast but increases write complexity because a completed query affects every prefix along its path.

Aggregation

For the query "new york weather", increment counts for:

n
ne
new
new y
new yo
and so on

This can be done in batch jobs, stream processors, or a hybrid approach.

Caching

Cache hot prefixes at the edge and in service memory. Set minimum prefix lengths if one-character prefixes are too expensive or low quality.

Personalization

Start with global and regional rankings. Add personalization only after the base system is correct, because personalization increases privacy, storage, and ranking complexity.

#Common Interview Mistakes

Querying the primary database on every keystroke
Returning all matches instead of top-k suggestions
Updating Trie rankings synchronously on the user request path
Ignoring client-side debouncing
Forgetting stale-but-available serving during ranking pipeline failures