#Introduction
You are designing autocomplete or local search. The user types a prefix or drags a map, and the UI expects results immediately.
The primary database has the truth. But the primary database is not the best serving path for every keystroke, radius search, or ranked query.
Search serving architecture is the pattern of building a read-optimized view of data, serving from that view, and refreshing it without blocking users.
#Serving Indexes vs Source of Truth
The source of truth is where writes are validated:
- business metadata database
- document database
- query event log
- product catalog database
The serving index is optimized for reads:
- prefix to top-k suggestions
- geospatial cell to business ids
- inverted index from token to document ids
- vector index from embedding to nearest neighbors
The serving index is a derived view. If it is lost, you rebuild it from the source of truth or event log.
Source of truth
-> index builder
-> serving index
-> low-latency query API
This split keeps writes correct and reads fast.
#Offline Builds and Atomic Swaps
Many search indexes are easier to build offline.
1. Read source data or events.
2. Build a new index snapshot.
3. Validate counts and basic queries.
4. Upload snapshot to storage.
5. Warm serving nodes.
6. Atomically switch traffic to the new version.
Atomic swaps avoid half-built indexes. Serving nodes either use version 41 or version 42, not a random mix of partially copied files.
For autocomplete, that snapshot might be a compact Trie or prefix map. For proximity search, it might be geospatial cells mapped to business ids.
Fresh writes can flow through a small delta index while the next full snapshot builds.
#Hot Query Caches
Search traffic is usually skewed.
Autocomplete prefixes like a, new, and weather are hot. Map searches for dense city centers are hot. Product searches for popular brands are hot.
Use cache layers deliberately:
- client debounce to avoid unnecessary requests
- edge cache for public, popular queries
- service memory cache for hot prefixes or cells
- durable serving store for cold lookups
Do not cache blindly. Include the dimensions that affect the result:
prefix + language + region + safeSearch + device
Caching new globally may serve bad results if rankings are regional or language-specific.
#Fallbacks and Freshness
Search serving systems should degrade gracefully.
If the latest index build fails:
- keep serving the previous snapshot
- alert on staleness
- disable only the affected region or variant
- fall back to a slower but safe path for low traffic queries
Freshness depends on the product:
| Product | Freshness expectation |
|---|---|
| Autocomplete trends | minutes to hours |
| Business address update | minutes |
| Search relevance tuning | hours to days |
| Security or legal removal | immediate or near-immediate |
When freshness requirements differ, split the pipeline. Do not force every update through the slowest full rebuild path.
#Common Interview Mistakes
Mistake 1: Querying the write database for every search.
The write database is optimized for correctness, not every read pattern.
Mistake 2: Treating the index as source of truth.
Indexes are derived. You need a rebuild path.
Mistake 3: Rebuilding in place.
Build a new version, validate it, then swap.
Mistake 4: Ignoring cache dimensions.
Region, language, personalization, and safety filters can all change results.
Mistake 5: Having no stale-serving policy.
If the ranking job fails, the product should keep serving the last good index.
#Summary: What to Remember
Search serving separates truth from speed.
The source of truth accepts validated writes. The serving index answers read-heavy, latency-sensitive queries. Builders create snapshots or deltas. Serving nodes cache hot results and keep the last good version when refresh fails.
This pattern supports autocomplete, geospatial search, product search, document search, and many ranking systems.