Geospatial Indexing

#Introduction

You are designing Yelp nearby search. The user asks for coffee within two miles.

You write:

SELECT *
FROM businesses
ORDER BY distance(latitude, longitude, :lat, :lng)
LIMIT 20;

The interviewer asks: "How many rows did you just scan?"

If the answer is "all of them," the design is dead. A proximity service needs a way to turn a latitude/longitude radius into a small candidate set before it calculates exact distance.

That is what geospatial indexing does.

#The Radius Query

A nearby search has two parts:

Candidate retrieval: find businesses that might be inside the radius.
Exact filtering: compute true distance and remove false positives.

The candidate step needs an index. The exact step needs math.

User location + radius
  -> covering cells or spatial range
  -> candidate business ids
  -> exact distance filter
  -> ranking and pagination

The index is allowed to return extra candidates. It is not allowed to miss nearby businesses.

#Geohash and Cell Covering

Geohash encodes latitude and longitude into a string. Nearby points often share a prefix.

Business A: dr5ru7
Business B: dr5ru8
Business C: 9q8yyk

If the user searches in New York, the service can look up geohash prefixes around New York instead of scanning businesses in Tokyo, Paris, and Chicago.

Longer prefixes mean smaller cells:

Prefix length	Cell size	Use case
Short	large area	coarse regional filtering
Medium	city/neighborhood scale	common nearby search
Long	small area	dense urban precision

A typical flow:

Pick geohash precision based on radius.
Find the cell containing the user.
Include neighboring cells so boundary results are not missed.
Fetch business ids from those cells.
Compute exact distance for each candidate.

Neighbor cells matter. If the user is standing near a cell boundary, the closest coffee shop may be in the next cell.

#S2, H3, and R-trees

Geohash is not the only option.

S2 maps the earth onto a sphere and divides it into hierarchical cells. It handles spherical geometry better than rectangular latitude/longitude boxes.

H3 uses hexagonal cells. Hexagons have more uniform neighbors than squares, which is useful for analytics and coverage.

R-trees index bounding boxes. They are common inside spatial databases and can support rectangle and polygon queries.

The interview-level distinction:

Approach	Best for	Tradeoff
Geohash	simple prefix lookup, key-value stores	rectangular cells and boundary care
S2/H3	hierarchical global cells	more library-specific concepts
R-tree	database-native spatial range queries	harder to distribute manually

If you are designing an app-level service with Redis or a key-value store, geohash/S2/H3 cells are easy to shard and cache. If you are designing inside PostGIS, an R-tree-backed spatial index may be the right answer.

#Filtering and Ranking

The geo index gives candidates, not final results.

After candidate retrieval:

compute exact distance with a spherical formula
filter by radius
filter by category, price, rating, open hours
rank by distance plus relevance
paginate results

For example:

Candidates from cells: 4,800 businesses
After exact radius:    1,250 businesses
After filters:           180 businesses
Return:                   20 businesses

Ranking is usually not pure distance. A restaurant 900 meters away with 4.8 stars may outrank a 400 meter result with no reviews. The geo index should narrow the set; the ranking layer should make the product decision.

#Common Interview Mistakes

Mistake 1: Scanning every business.

Distance calculation is CPU work. Do it after narrowing candidates, not before.

Mistake 2: Forgetting neighboring cells.

Cell boundaries are artificial. Real users do not care which side of the boundary a business falls on.

Mistake 3: Choosing one fixed precision for every radius.

A two-block search and a fifty-mile search need different cell sizes.

Mistake 4: Treating the spatial index as final truth.

Cells are approximations. Always do exact distance filtering.

Mistake 5: Ignoring data locality.

Most searches are local. Pair geospatial indexing with regional routing and geo-partitioning.

#Summary: What to Remember

Geospatial indexing turns a radius search into a small candidate lookup.

Geohash, S2, H3, and R-trees all solve the same core problem: avoid scanning every latitude/longitude row. They differ in cell shape, library support, and how naturally they distribute.

In interviews, describe the full pipeline: cover the radius with cells, include neighbors, fetch candidates, compute exact distance, filter, rank, and paginate. That is the answer behind Yelp, Google Maps local search, delivery apps, and ride-hailing dispatch.