#Introduction
The interviewer says: "Design Yelp nearby search."
You say every business has a latitude and longitude column, then calculate distance for each row. The follow-up is immediate: "What if there are 100 million businesses and the user only wants coffee within two miles?"
A proximity service is a geospatial retrieval problem. The system must quickly narrow a two-dimensional area into a small candidate set, then rank and filter those candidates.
Ready to practice? Try the Proximity Service practice problem and build this system step-by-step with AI-guided feedback.
#Functional Requirements
1. Nearby business search
- Users search by latitude, longitude, radius, category, rating, price, and open-now filters
- Results should include distance, rating, review count, and enough metadata to render a list
- Results should be paginated
The core mistake is scanning every business and computing exact distance. The system needs a spatial index.
#Geospatial Indexing
Common approaches:
- Geohash: encode latitude and longitude into sortable strings where nearby points often share prefixes
- S2 or H3: divide the world into hierarchical cells
- R-tree: index bounding boxes for spatial range queries
A typical flow is:
- Convert the search point and radius into covering cells
- Read candidate business IDs from those cells
- Fetch business metadata
- Compute exact distance and apply filters
- Rank and paginate the results
2. Reviews and ratings
- Users can read business reviews
- Users can leave star ratings and text reviews
- Business search results should show rating aggregates
Reviews are read-heavy. Store raw reviews separately and maintain denormalized aggregates like average rating and review count on the business profile.
#Non-Functional Requirements
Read-heavy performance
Business locations change slowly, but users search constantly. Cache hot geospatial cells, popular category filters, and business profile summaries.
Global availability
Most searches are geographically local. A Tokyo user searching for sushi should not need to hit a US-East database. Partition data by region and use regional routing so search traffic reaches nearby clusters.
Freshness
New businesses and address corrections should appear quickly, but most search caches can tolerate short TTLs. Review aggregate updates can be asynchronous.
Ranking quality
Distance is only one ranking signal. Relevance may also include category match, rating, review count, open hours, promoted listings, and personalization.
#API Design
Nearby search
GET /api/v1/businesses/search?lat=40.741&lng=-73.989&radiusMeters=2000&category=coffee&limit=20
Response:
{
"businesses": [
{
"id": "biz_123",
"name": "North Star Coffee",
"distanceMeters": 312,
"rating": 4.6,
"reviewCount": 871
}
],
"nextCursor": "cell_abc:score_312"
}
Create review
POST /api/v1/businesses/biz_123/reviews
Request:
{
"rating": 5,
"text": "Fast service and great espresso.",
"visitId": "visit_789"
}
Response:
{
"reviewId": "rev_456",
"businessId": "biz_123",
"rating": 5,
"status": "published"
}
#High Level Design
The client sends search requests to an API gateway. The gateway routes the request to a regional search service. The search service expands the radius into geospatial cells, reads candidate business IDs from a geo index cache, then fetches business metadata from the business database.
Reviews live behind a separate review service. Search results usually need rating aggregates, not full review bodies, so aggregates can be denormalized onto business metadata while raw reviews remain in a review store.
#Detailed Design
Geohash cells
A geohash prefix represents a rectangular area. Short prefixes cover large areas. Long prefixes cover small areas. To search within a radius, choose a precision, find neighboring cells, and fetch candidates from each cell.
Filtering
Apply filters after candidate retrieval. The geo index should narrow by location, but category, open hours, and price may live in the business metadata store or a search index.
Caching
Cache hot cells such as "restaurants near Times Square" or "coffee near downtown San Francisco." Use TTLs because businesses and ratings can change. The broader read path follows the same derived-index pattern described in Search Serving Architecture.
Regional sharding
Shard by geography so local search touches local data. Cross-region replication can support travel queries, disaster recovery, and global map browsing. The routing tradeoffs are covered in Regional Routing and Geo-Partitioning.
#Common Interview Mistakes
- Scanning every business and computing distance for each row
- Forgetting neighboring geohash cells at cell boundaries
- Mixing raw review writes into the latency-sensitive search path
- Ignoring pagination for dense urban searches
- Sending all global traffic to one database region