Introduction
Alright, so you walk into that system design interview and the interviewer says: "Design a URL shortening service like TinyURL or Bitly."
You're thinking, "Oh great, this sounds straightforward." Take a long URL, make it short, store it, done. Easy.
Then they hit you with: "How do you handle 10 million redirects per day? How do you guarantee unique short codes? What happens when two users try to shorten the same URL at the exact same time?"
And suddenly you're realizing this isn't as simple as you thought.
This walkthrough covers a clean architecture that works at scale, using pre-generated codes and separate read/write servers. This is a pattern you'll see in real production systems.
Functional Requirements
Let's be clear about what we need to support:
1. Shorten a URL
- User provides a long URL
- System returns a unique short URL
- Example:
https://www.amazon.com/some-really-long-product-urlbecomestiny.url/12345678
2. Redirect to original URL
- User visits the short URL
- System redirects them to the original long URL
- Should be fast and reliable
3. Short codes must be unique
- No two long URLs should get the same short code
- Once assigned, a short code always points to the same long URL
That's it. Two main operations. Don't overcomplicate it in the interview.
Non-Functional Requirements
Low latency for redirects
- Nobody wants to wait more than half a second for a redirect.
- This is the critical path that affects user experience so our target is under 100ms for the lookup
High read traffic
- Redirects happen way more often than URL creation
- Read-to-write ratio is often 10:1 or higher
- Need to scale reads independently from writes
- Example: 100 million redirects/day vs 10 million created URLs/day
Unique short codes (guaranteed)
- Can't have two different URLs with the same short code
- This is critical - collisions break the entire system
- Must work even under concurrent requests
High availability
- Service needs to be up 99.9%+ of the time
- A link that doesn't work is worse than no link at all
- Should handle server failures gracefully
Scalability
- Start with thousands of URLs
- Scale to billions of URLs
- Handle millions of redirects per day
API Design
Keep it simple. Two REST APIs that map directly to our functional requirements:
1. Create Short URL
POST /api/v1/urls
Request Body:
{
"longUrl": "https://www.amazon.com/some-product-url",
"custom_alias": "optional",
"expiration_date": "optional"
}
Response:
{
"shortUrl": "tiny.url/12345678"
}
Status: 201 Created
2. Redirect to Original URL
GET /:slug
Example: GET /12345678
Response:
HTTP 301 (or 302) Redirect
Location: https://www.amazon.com/some-product-url
Status: 302 Found (or 301 Moved Permanently)
Why 301 vs 302?
- 301 (Permanent): Browser caches the redirect, faster but can't track analytics
- 302 (Temporary): Every redirect hits your server, slower but you can track clicks
Most URL shorteners use 302 for analytics.
High Level Design
Here's the overall architecture:
Key Components
1. Load Balancer / API Gateway
- Single entry point for all requests
- Routes POST /shorten to write server
- Routes GET /{code} to read servers (round-robin)
- Can cache hot links
2. Write Server (Single Instance)
- Handles all URL shortening requests
- Assigns pre-generated short codes
- Writes to PostgreSQL master
- Low traffic, so one server is usually enough
3. Read Servers (Multiple Instances)
- Handle all redirect requests
- Stateless and horizontally scalable (see Scaling: Vertical vs Horizontal for details)
- Query Redis cache first, then database
- High traffic, so we scale these out
4. PostgreSQL Database
- Master handles all writes
- Read replicas handle all reads
- Stores URL mappings and pre-generated codes
- Simple schema, indexed for fast lookups
5. Redis Cache
- Caches popular short codes
- Reduces database load by 90%+
- Keeps latency under 1ms for hot links
5. PostgreSQL Database Storage
- Stores pre-generated urls
- Reduces overall size of main database, so queries are running against less rows.
6. Background Service
- Periodically moves generated urls into our database for use.
- Move expired urls back into our generated url bucket.
- This prevents our write service from having to create unique urls, since they are all pregenerated, and keeps the queries on a small number of urls.
For more on when and how to use caching effectively, see our guide on Databases & Caching.
Why This Architecture Works
Separation of concerns:
- Write and read paths are completely separate
- Can scale them independently based on traffic
Optimized for read-heavy workload:
- Multiple read servers
- Caching layer
- Database read replicas
Simple and proven:
- No complex distributed coordination
- Easy to reason about
- Used by real production URL shorteners
Detailed Design
Now let's dive into the specifics of how each piece works.
Pre-Generated Short Codes (The Clever Part)
Instead of generating a unique code every time someone requests a short URL, we pre-generate millions of codes ahead of time.
How it works:
- Pre-generate a pool of codes: Generate codes using either:
- 8 characters from [a-z] (26 characters) = 26^8 ≈ 10^12 combinations, or
- 7 characters from [a-z][A-Z][0-9] (Base62, 62 characters) = 62^7 ≈ 10^12 combinations
- Store with a flag: Each code has an
is_usedflag in the database - Assign on demand: When a user wants to shorten a URL, grab an unused code, mark it as used, and associate it with the long URL
- Use a second database: If we are pre-generating urls than storing all of them in our main database will increase latency on our reads. We need to store the pre-generated list in a second database and move 'free' urls into our smaller database which is where we actually read from and write to.
- Use a background service: To both add more free urls and mark is_used as false for urls that have expired we need a background service that runs periodically and adds more urls if our main database is below a certain amount.
Why this approach wins:
- No collisions: Every code is pre-generated and unique
- No runtime computation: Just grab the next available code (simple SELECT and UPDATE)
- Simple and fast: Single database transaction, predictable performance, also smaller database to query against.
- Massive capacity: 10^12 combinations provides enormous headroom
The math:
Both approaches give us approximately 10^12 possible codes - that's a trillion combinations. For detailed calculations on how we arrive at these numbers using the golden rule (2^10 ≈ 10^3), see our guide on How to Calculate Throughput & Database Size.
Scaling Servers
The key insight: Reads and writes have totally different characteristics.
Read Pattern (Redirects)
- Volume: 10-100x more than writes
- Latency requirement: Super low (under 10ms)
- Pattern: GET operations, read-only
How to scale:
- Horizontal scaling (add more read servers) - see Scaling: Vertical vs Horizontal for details
- Caching (Redis for hot links)
- Database read replicas
- CDN for extremely popular links
Write Pattern (Shortening)
- Volume: Much lower
- Latency requirement: Can be 50-100ms, users don't care as much
- Pattern: POST operations, needs to write to database
How to scale:
- Usually one server is enough
- If needed, can have a small pool of write servers
This asymmetry is why we separate them.
Caching Strategy
The problem: Even with read replicas, hitting the database for every redirect adds latency.
The solution: Cache popular short codes in Redis using the cache-aside pattern. For a deep dive on caching strategies, eviction policies (LRU vs LFU), and when to use caches vs databases, see our guide on Databases & Caching.
Cache-aside pattern:
1. Request comes in for /aB3xY
2. Check Redis
3. Cache hit? Return immediately (~1ms)
4. Cache miss? Query PostgreSQL replica (~5-10ms)
5. Store result in Redis for next time
6. Return the long URL
What to cache:
- Hot links (top 10% probably get 90% of traffic)
- Can cache indefinitely (short codes don't change)
- Use LRU eviction to keep cache size manageable
Impact:
Without cache: Every redirect = database hit = 5-10ms
With cache: 90% of redirects = Redis hit = 0.1-1ms
That's a 10x latency improvement for most traffic.
Scaling database
Database Size
CREATE TABLE url_mappings (
short_code VARCHAR(7) PRIMARY KEY,
long_url TEXT NOT NULL,
is_used BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
expiration TIMESTAMP DEFAULT (CURRENT_TIMESTAMP + INTERVAL '5 years')
);
Row size calculation:
short_code: 7 byteslong_url: 100 bytes (average)is_used: 1 bytecreated_at: 8 bytesexpiration: 8 bytes- Total: 124 bytes per row (rounded to 150 bytes for overhead)
Note: For pre-generated codes (before assignment), we only need short_code and is_used (8 bytes total).
If we pre-generate in batches of 1 billion URLs, even if we round 150 to 500 bytes per row, our database would need a maximum size of 500 GB per 1 billion URLs generated.
This means after all the URLs are used, the next batch generation would take significant time, so it's worth running this process in the background. We don't need to get too deep into the details for this design.
If we create indexes on is_used, we can retrieve the next unused URL very quickly.
Geographic latency consideration:
What about dealing with latency geographically? If we have only 1 database, then we have additional latency as our servers query this database depending on its location in the world.
Master-Replica Setup:
Write Server sends all writes to Master
Master replicates to Read Replicas (2-3 instances)
Read Servers query Replicas instead of Master
Master DB can scale vertically, as we calculated, and we have read copies to reduce network latency.
Why this works:
- Reduces load on master by 100x
- Read replicas can be geographically distributed closer to our read servers
- Master focuses on writes only
Replication lag consideration:
- Usually under 1 second
- For URL shortener, this is fine
- User won't notice if their new short URL takes 1 second to propagate
For a comprehensive guide on when to scale up vs scale out, see Scaling: Vertical vs Horizontal.
Common Interview Mistakes
Mistake 1: Generating codes on the fly
"We'll just hash the URL and use the first 6 characters."
Problem: Hash collisions. Two different URLs might hash to the same short code.
Better: Pre-generate codes. No collisions possible.
Mistake 2: Not separating reads and writes
"We'll just use one set of servers for everything."
Problem: Read traffic overwhelms write logic. Can't scale them independently.
Better: Separate read servers from write servers. Scale reads horizontally (see Scaling: Vertical vs Horizontal for when and how to scale horizontally).
Mistake 3: Hitting the database for every redirect
"We'll just query PostgreSQL for every GET request."
Problem: Database becomes the bottleneck. Latency suffers.
Better: Cache hot links in Redis. 90% of requests never hit the database. Learn more about caching strategies in Databases & Caching.
Mistake 4: Overcomplicating the write path
"We'll use Kafka and a distributed queue and microservices..."
Problem: You're adding complexity for no reason.
Better: One write server grabbing pre-generated codes is simple and fast.
Mistake 5: Not thinking about the read-to-write ratio
"We'll scale everything equally."
Problem: Waste of resources. You don't need 100 write servers when you get 10 writes/second.
Better: Scale reads aggressively. Keep writes minimal. Follow the traffic pattern.
Interview golden rule:
Don't just list components. Explain WHY you're making each choice and HOW it handles the specific challenges of a URL shortener (uniqueness, read-heavy traffic, low latency).