Proximity Service (Yelp)

#Introduction

The interviewer says: "Design Yelp nearby search."

You say every business has a latitude and longitude column, then calculate distance for each row. The follow-up is immediate: "What if there are 100 million businesses and the user only wants coffee within two miles?"

A proximity service is a geospatial retrieval problem. The system must quickly narrow a two-dimensional area into a small candidate set, then rank and filter those candidates.

Ready to practice? Try the Proximity Service practice problem and build this system step-by-step with AI-guided feedback.

#Functional Requirements

1. Nearby business search

Users search by latitude, longitude, radius, category, rating, price, and open-now filters
Results should include distance, rating, review count, and enough metadata to render a list
Results should be paginated

The core mistake is scanning every business and computing exact distance. The system needs a spatial index.

#Geospatial Indexing

Common approaches:

Geohash: encode latitude and longitude into sortable strings where nearby points often share prefixes
S2 or H3: divide the world into hierarchical cells
R-tree: index bounding boxes for spatial range queries

A typical flow is:

Convert the search point and radius into covering cells
Read candidate business IDs from those cells
Fetch business metadata
Compute exact distance and apply filters
Rank and paginate the results

2. Reviews and ratings

Users can read business reviews
Users can leave star ratings and text reviews
Business search results should show rating aggregates

Reviews are read-heavy. Store raw reviews separately and maintain denormalized aggregates like average rating and review count on the business profile.

#Non-Functional Requirements

Read-heavy performance

Business locations change slowly, but users search constantly. Cache hot geospatial cells, popular category filters, and business profile summaries.

Global availability

Most searches are geographically local. A Tokyo user searching for sushi should not need to hit a US-East database. Partition data by region and use regional routing so search traffic reaches nearby clusters.

Freshness

New businesses and address corrections should appear quickly, but most search caches can tolerate short TTLs. Review aggregate updates can be asynchronous.

Ranking quality

Distance is only one ranking signal. Relevance may also include category match, rating, review count, open hours, promoted listings, and personalization.

#API Design

Nearby search

GET /api/v1/businesses/search?lat=40.741&lng=-73.989&radiusMeters=2000&category=coffee&limit=20

Response:

{
  "businesses": [
    {
      "id": "biz_123",
      "name": "North Star Coffee",
      "distanceMeters": 312,
      "rating": 4.6,
      "reviewCount": 871
    }
  ],
  "nextCursor": "cell_abc:score_312"
}

Create review

POST /api/v1/businesses/biz_123/reviews

Request:

{
  "rating": 5,
  "text": "Fast service and great espresso.",
  "visitId": "visit_789"
}

Response:

{
  "reviewId": "rev_456",
  "businessId": "biz_123",
  "rating": 5,
  "status": "published"
}

#High Level Design

The client sends search requests to an API gateway. The gateway routes the request to a regional search service. The search service expands the radius into geospatial cells, reads candidate business IDs from a geo index cache, then fetches business metadata from the business database.

Reviews live behind a separate review service. Search results usually need rating aggregates, not full review bodies, so aggregates can be denormalized onto business metadata while raw reviews remain in a review store.

#Detailed Design

Geohash cells

A geohash prefix represents a rectangular area. Short prefixes cover large areas. Long prefixes cover small areas. To search within a radius, choose a precision, find neighboring cells, and fetch candidates from each cell.

Filtering

Apply filters after candidate retrieval. The geo index should narrow by location, but category, open hours, and price may live in the business metadata store or a search index.

Caching

Cache hot cells such as "restaurants near Times Square" or "coffee near downtown San Francisco." Use TTLs because businesses and ratings can change. The broader read path follows the same derived-index pattern described in Search Serving Architecture.

Regional sharding

Shard by geography so local search touches local data. Cross-region replication can support travel queries, disaster recovery, and global map browsing. The routing tradeoffs are covered in Regional Routing and Geo-Partitioning.

#Common Interview Mistakes

Scanning every business and computing distance for each row
Forgetting neighboring geohash cells at cell boundaries
Mixing raw review writes into the latency-sensitive search path
Ignoring pagination for dense urban searches
Sending all global traffic to one database region