Search Autocomplete (Typeahead)

Serving top-k prefix suggestions with low latency and fresh popularity rankings

S
System Design Sandbox··11 min read
Learn how to design search autocomplete. Covers Trie and prefix indexes, top-k ranking, hot prefix caching, query event ingestion, batch versus streaming refreshes, and edge serving.

#Introduction

The interviewer says: "Design search autocomplete."

You say to run a database LIKE 'prefix%' query on every keystroke. Then the follow-up arrives: "What if there are billions of queries, users type fast, and trending searches need to appear within minutes?"

Autocomplete is a low-latency top-k prefix lookup system. The read path should be precomputed and cache-friendly, while the write path aggregates completed searches into refreshed rankings.

Ready to practice? Try the Autocomplete practice problem and build this system step-by-step with AI-guided feedback.


#Functional Requirements

1. Prefix matching

  • A user types a prefix
  • The system returns matching suggestions
  • The response should usually include the top 5 suggestions

A Trie is the classic data structure. Each edge represents a character, and each node represents a prefix. If each node stores the top suggestions for that prefix, serving a request is fast.

2. Popularity ranking

  • Suggestions are ranked by search frequency
  • Ranking can be global, regional, language-specific, or personalized
  • Completed search events are the source of popularity counts

The serving path should not count raw query events synchronously. Query events should flow into an aggregation pipeline that refreshes top-k snapshots.


#Non-Functional Requirements

Keystroke latency

Autocomplete should return in roughly 50-100ms. Use client-side debouncing, edge caching, in-memory prefix maps, and small response payloads. The read path is a specialized example of search serving architecture.

Trending freshness

The system does not need exact counts per keystroke, but it should reflect major trends within minutes or hours. Batch refreshes are simpler. Streaming refreshes are fresher but more complex.

High QPS

Autocomplete receives a request for nearly every character typed. Hot prefixes such as "a", "new", or "weather" need aggressive caching.

Availability

If ranking refresh is delayed, serving should continue from the last known snapshot. Stale suggestions are better than an empty dropdown.


#API Design

Fetch suggestions

GET /api/v1/suggestions?prefix=new%20y&limit=5&locale=en-US&userRegion=US

Response:

{
  "suggestions": [
    { "text": "new york weather", "score": 98210, "source": "regional" },
    { "text": "new york times", "score": 77120, "source": "global" }
  ]
}

Record search event

POST /api/v1/search-events

Request:

{
  "eventId": "evt_123",
  "query": "new york weather",
  "userRegion": "US",
  "language": "en",
  "timestamp": "2026-04-22T12:00:00Z"
}

Response:

{
  "accepted": true,
  "eventId": "evt_123"
}

#High Level Design

Search Client
Edge Cache / API
Suggestion Service
Trie / Prefix Cache
Suggestion Store
Search Event Log
Ranking Aggregator

The client debounces input and calls the closest edge/API endpoint. Hot prefixes are served from edge cache. Cache misses go to the suggestion service, which reads from a memory-resident Trie or prefix map.

Completed searches are appended to an event log. A ranking aggregator computes counts and top-k suggestions by prefix, region, language, and time window. Refreshed snapshots are written to the suggestion store and loaded into serving caches.


#Detailed Design

Trie serving

Each Trie node can store:

  • prefix
  • top 5 global suggestions
  • top 5 regional suggestions
  • last updated timestamp

This makes reads fast but increases write complexity because a completed query affects every prefix along its path.

Aggregation

For the query "new york weather", increment counts for:

  • n
  • ne
  • new
  • new y
  • new yo
  • and so on

This can be done in batch jobs, stream processors, or a hybrid approach.

Caching

Cache hot prefixes at the edge and in service memory. Set minimum prefix lengths if one-character prefixes are too expensive or low quality.

Personalization

Start with global and regional rankings. Add personalization only after the base system is correct, because personalization increases privacy, storage, and ranking complexity.


#Common Interview Mistakes

  • Querying the primary database on every keystroke
  • Returning all matches instead of top-k suggestions
  • Updating Trie rankings synchronously on the user request path
  • Ignoring client-side debouncing
  • Forgetting stale-but-available serving during ranking pipeline failures