Search Serving Architecture

Separating source-of-truth writes from low-latency search reads

S
System Design Sandbox··10 min read
Learn how search systems serve fast reads from derived indexes. Covers serving indexes, offline builds, atomic swaps, hot query caches, freshness tiers, fallbacks, and stale-serving policy.

#Introduction

You are designing autocomplete or local search. The user types a prefix or drags a map, and the UI expects results immediately.

The primary database has the truth. But the primary database is not the best serving path for every keystroke, radius search, or ranked query.

Search serving architecture is the pattern of building a read-optimized view of data, serving from that view, and refreshing it without blocking users.


#Serving Indexes vs Source of Truth

The source of truth is where writes are validated:

  • business metadata database
  • document database
  • query event log
  • product catalog database

The serving index is optimized for reads:

  • prefix to top-k suggestions
  • geospatial cell to business ids
  • inverted index from token to document ids
  • vector index from embedding to nearest neighbors

The serving index is a derived view. If it is lost, you rebuild it from the source of truth or event log.

Source of truth
  -> index builder
  -> serving index
  -> low-latency query API

This split keeps writes correct and reads fast.


#Offline Builds and Atomic Swaps

Many search indexes are easier to build offline.

1. Read source data or events.
2. Build a new index snapshot.
3. Validate counts and basic queries.
4. Upload snapshot to storage.
5. Warm serving nodes.
6. Atomically switch traffic to the new version.

Atomic swaps avoid half-built indexes. Serving nodes either use version 41 or version 42, not a random mix of partially copied files.

For autocomplete, that snapshot might be a compact Trie or prefix map. For proximity search, it might be geospatial cells mapped to business ids.

Fresh writes can flow through a small delta index while the next full snapshot builds.


#Hot Query Caches

Search traffic is usually skewed.

Autocomplete prefixes like a, new, and weather are hot. Map searches for dense city centers are hot. Product searches for popular brands are hot.

Use cache layers deliberately:

  • client debounce to avoid unnecessary requests
  • edge cache for public, popular queries
  • service memory cache for hot prefixes or cells
  • durable serving store for cold lookups

Do not cache blindly. Include the dimensions that affect the result:

prefix + language + region + safeSearch + device

Caching new globally may serve bad results if rankings are regional or language-specific.


#Fallbacks and Freshness

Search serving systems should degrade gracefully.

If the latest index build fails:

  • keep serving the previous snapshot
  • alert on staleness
  • disable only the affected region or variant
  • fall back to a slower but safe path for low traffic queries

Freshness depends on the product:

ProductFreshness expectation
Autocomplete trendsminutes to hours
Business address updateminutes
Search relevance tuninghours to days
Security or legal removalimmediate or near-immediate

When freshness requirements differ, split the pipeline. Do not force every update through the slowest full rebuild path.


#Common Interview Mistakes

Mistake 1: Querying the write database for every search.

The write database is optimized for correctness, not every read pattern.

Mistake 2: Treating the index as source of truth.

Indexes are derived. You need a rebuild path.

Mistake 3: Rebuilding in place.

Build a new version, validate it, then swap.

Mistake 4: Ignoring cache dimensions.

Region, language, personalization, and safety filters can all change results.

Mistake 5: Having no stale-serving policy.

If the ranking job fails, the product should keep serving the last good index.


#Summary: What to Remember

Search serving separates truth from speed.

The source of truth accepts validated writes. The serving index answers read-heavy, latency-sensitive queries. Builders create snapshots or deltas. Serving nodes cache hot results and keep the last good version when refresh fails.

This pattern supports autocomplete, geospatial search, product search, document search, and many ranking systems.