#Introduction
You are designing Yelp nearby search. The user asks for coffee within two miles.
You write:
SELECT *
FROM businesses
ORDER BY distance(latitude, longitude, :lat, :lng)
LIMIT 20;
The interviewer asks: "How many rows did you just scan?"
If the answer is "all of them," the design is dead. A proximity service needs a way to turn a latitude/longitude radius into a small candidate set before it calculates exact distance.
That is what geospatial indexing does.
#The Radius Query
A nearby search has two parts:
- Candidate retrieval: find businesses that might be inside the radius.
- Exact filtering: compute true distance and remove false positives.
The candidate step needs an index. The exact step needs math.
User location + radius
-> covering cells or spatial range
-> candidate business ids
-> exact distance filter
-> ranking and pagination
The index is allowed to return extra candidates. It is not allowed to miss nearby businesses.
#Geohash and Cell Covering
Geohash encodes latitude and longitude into a string. Nearby points often share a prefix.
Business A: dr5ru7
Business B: dr5ru8
Business C: 9q8yyk
If the user searches in New York, the service can look up geohash prefixes around New York instead of scanning businesses in Tokyo, Paris, and Chicago.
Longer prefixes mean smaller cells:
| Prefix length | Cell size | Use case |
|---|---|---|
| Short | large area | coarse regional filtering |
| Medium | city/neighborhood scale | common nearby search |
| Long | small area | dense urban precision |
A typical flow:
- Pick geohash precision based on radius.
- Find the cell containing the user.
- Include neighboring cells so boundary results are not missed.
- Fetch business ids from those cells.
- Compute exact distance for each candidate.
Neighbor cells matter. If the user is standing near a cell boundary, the closest coffee shop may be in the next cell.
#S2, H3, and R-trees
Geohash is not the only option.
S2 maps the earth onto a sphere and divides it into hierarchical cells. It handles spherical geometry better than rectangular latitude/longitude boxes.
H3 uses hexagonal cells. Hexagons have more uniform neighbors than squares, which is useful for analytics and coverage.
R-trees index bounding boxes. They are common inside spatial databases and can support rectangle and polygon queries.
The interview-level distinction:
| Approach | Best for | Tradeoff |
|---|---|---|
| Geohash | simple prefix lookup, key-value stores | rectangular cells and boundary care |
| S2/H3 | hierarchical global cells | more library-specific concepts |
| R-tree | database-native spatial range queries | harder to distribute manually |
If you are designing an app-level service with Redis or a key-value store, geohash/S2/H3 cells are easy to shard and cache. If you are designing inside PostGIS, an R-tree-backed spatial index may be the right answer.
#Filtering and Ranking
The geo index gives candidates, not final results.
After candidate retrieval:
- compute exact distance with a spherical formula
- filter by radius
- filter by category, price, rating, open hours
- rank by distance plus relevance
- paginate results
For example:
Candidates from cells: 4,800 businesses
After exact radius: 1,250 businesses
After filters: 180 businesses
Return: 20 businesses
Ranking is usually not pure distance. A restaurant 900 meters away with 4.8 stars may outrank a 400 meter result with no reviews. The geo index should narrow the set; the ranking layer should make the product decision.
#Common Interview Mistakes
Mistake 1: Scanning every business.
Distance calculation is CPU work. Do it after narrowing candidates, not before.
Mistake 2: Forgetting neighboring cells.
Cell boundaries are artificial. Real users do not care which side of the boundary a business falls on.
Mistake 3: Choosing one fixed precision for every radius.
A two-block search and a fifty-mile search need different cell sizes.
Mistake 4: Treating the spatial index as final truth.
Cells are approximations. Always do exact distance filtering.
Mistake 5: Ignoring data locality.
Most searches are local. Pair geospatial indexing with regional routing and geo-partitioning.
#Summary: What to Remember
Geospatial indexing turns a radius search into a small candidate lookup.
Geohash, S2, H3, and R-trees all solve the same core problem: avoid scanning every latitude/longitude row. They differ in cell shape, library support, and how naturally they distribute.
In interviews, describe the full pipeline: cover the radius with cells, include neighbors, fetch candidates, compute exact distance, filter, rank, and paginate. That is the answer behind Yelp, Google Maps local search, delivery apps, and ride-hailing dispatch.