Regional Routing and Geo-Partitioning

#Introduction

You are designing a proximity service. A user in Tokyo searches for sushi.

If that request goes to a server in Virginia, which queries a database in Virginia, the system may still be correct. It will also feel slow.

Regional routing and geo-partitioning are about putting requests near the data and users they care about. They show up in local search, ride-hailing, collaborative tools, multiplayer games, and global APIs.

#Route by User vs Route by Data

There are two common routing goals.

Route by user location

Send the user to the nearest healthy region.

This minimizes network latency for generic APIs:

Toronto user -> us-east
Paris user   -> eu-west
Tokyo user   -> ap-northeast

Route by data location

Send the request to the region that owns the data.

This matters when the query is about a place, room, tenant, or document:

User in Toronto searches Tokyo restaurants -> route to Asia search cluster
User in Paris opens US tenant data         -> route to tenant's home region

For proximity search, route by searched location. For account settings, route by user home region. For collaborative documents, route by document room ownership.

#Geo-Partitioning

Geo-partitioning splits data by geography:

country
region
city
geospatial cell
operational territory

The benefit is locality. A search for Tokyo businesses should hit Tokyo or Asia shards. A driver dispatch query for Manhattan should not scan Los Angeles.

This pairs naturally with geospatial indexing:

Region router
  -> Asia cluster
  -> Tokyo geo cells
  -> nearby business candidates

The shard key should match query patterns. If almost every query is local, geography is a strong partition key. If users frequently query globally, geography alone can create scatter-gather.

#Replication and Failover

Geo-partitioning creates a failure question:

"What happens if the region that owns the data is down?"

Options:

fail closed for strong correctness
serve read-only from a replica
route to a backup region with stale data
degrade features that require fresh local writes

For a proximity service, stale business search may be acceptable for a short period. For money movement, stale writes are not acceptable. For collaborative editing, moving an active document room during a region outage is possible but operationally hard.

The failover policy should match the product's consistency requirement.

#Active-Active Tradeoffs

Active-active means multiple regions accept traffic at the same time.

It sounds ideal. It is also harder than active-passive because writes can conflict.

Good active-active candidates:

read-heavy local search
append-only event ingestion with regional keys
caches and derived indexes
systems with CRDT-style merge rules

Risky active-active candidates:

financial ledgers
inventory decrement
single-owner document operation sequencing
anything needing strict global uniqueness without coordination

If you make a system active-active, explain the conflict rule. Otherwise, active-active is just a buzzword.

#Common Interview Mistakes

Mistake 1: Always routing to the nearest user region.

Sometimes the data is somewhere else. Route by the thing being queried.

Mistake 2: Saying "multi-region" without a write policy.

Who accepts writes? How are conflicts resolved? What happens during failover?

Mistake 3: Ignoring cross-region reads.

Travel, global admins, and analytics often cross regional boundaries.

Mistake 4: Overusing active-active.

Active-active is useful when conflicts are manageable. It is dangerous when correctness needs one owner.

Mistake 5: Forgetting data residency.

Some data cannot legally or contractually leave a region.

#Summary: What to Remember

Regional routing decides where a request should go. Geo-partitioning decides where data should live.

Route by user for generic latency. Route by data for location-scoped, tenant-scoped, or room-scoped operations. Replicate for availability, but define whether failover is read-only, stale, or writable.

In interviews, never just say "deploy globally." Say which region owns the data, how requests find that owner, and what happens when that region fails.