Scaling: Vertical vs Horizontal

Understanding when to scale up vs scale out in distributed systems

S
System Design Sandbox··12 min read
Learn the fundamental differences between vertical and horizontal scaling, when to use each approach, load balancers and API gateways, and how to make scaling decisions in system design interviews.

Introduction

Picture this: You're in your system design interview, and you just drew out a pretty solid architecture. Single server, database, cache, looking good.

Then the interviewer leans forward: "Okay, so this works for 1,000 users. What happens when you hit 1 million users? How do you scale this?"

And now you're standing there thinking... do I make the server bigger? Add more servers? What's the difference? When do I use which approach?

Don't panic. I'm going to break down vertical vs horizontal scaling in a way that'll make you sound like you've actually scaled production systems before.


The Two Ways to Scale: Bigger vs More

When your system needs to handle more load, you have exactly two options:

Vertical Scaling (Scale Up): Make your existing server more powerful

Horizontal Scaling (Scale Out): Add more servers

That's it. Every scaling strategy comes down to one of these two approaches (or a combo of both).


Vertical Scaling (Scale Up)

What It Means

Vertical scaling = throw more hardware at your existing server.

You're upgrading:

  • CPU cores (4 to 16)
  • RAM (16GB to 64GB)
  • Storage (HDD to SSD to NVMe)
  • Network bandwidth

The mental model:

Before: One server with 4 cores
After: Same server, now with 16 cores

The Pros

1. Dead simple

  • Often just click a button in your cloud console
  • No code changes required
  • No architectural complexity

2. No distributed system headaches

  • No load balancer needed
  • No data synchronization issues
  • No network latency between servers

3. Fast to implement

  • Takes minutes to hours, not days
  • Perfect when you need to scale RIGHT NOW

The Cons

1. Hardware limits There's a ceiling. You can't just keep adding RAM forever. Eventually, you hit the biggest machine available.

2. Single point of failure Your one beefy server goes down? Your entire app goes down. No redundancy.

3. Gets expensive FAST Going from a $100/month server to a $1000/month server doesn't give you 10x the performance. The cost curve is brutal.

4. Downtime during upgrades Need to upgrade? You're probably taking the server offline.

When to Use Vertical Scaling

Small to medium apps

  • Not expecting massive scale
  • Current server is only using 30-40% of resources

Legacy systems

  • Code wasn't built for distributed architecture
  • Refactoring would take months

Quick wins

  • Traffic spike coming tomorrow
  • No time for major architecture changes

Stateful applications

  • Databases (often)
  • In-memory caches
  • Systems where data locality matters

Example

Scenario: Your web server is hitting 80% CPU during peak hours.

Vertical scaling solution:

Current: 4 cores, 8GB RAM at $100/month
Upgrade: 8 cores, 16GB RAM at $200/month

Result: CPU drops to 40% during peak hours
Time to implement: 15 minutes

Horizontal Scaling (Scale Out)

What It Means

Horizontal scaling = add more servers running your application.

The mental model:

Before: 1 server handling 1,000 req/sec
After: 5 servers each handling 200 req/sec

The Pros

1. Practically unlimited scaling Need more capacity? Just add more servers. No hard ceiling.

2. Fault tolerance One server dies? The other 4 keep running. Your app stays up.

3. Cost-effective at scale Add cheap commodity servers instead of buying one super expensive machine.

4. No downtime for scaling Add new servers while the old ones keep running.

5. Can scale down easily Traffic drops? Remove servers and save money.

The Cons

1. Way more complex

  • Need a load balancer or API gateway
  • Have to handle distributed state
  • More moving parts = more things that can break

2. Architectural changes required

  • Your app needs to be stateless (or handle state carefully)
  • Can't store sessions on a single server
  • Database connections need pooling

3. Higher operational overhead

  • More servers to monitor
  • More deployment complexity
  • Network latency between servers

4. Not everything can scale horizontally Some things are inherently hard to distribute (like stateful systems or certain databases).

When to Use Horizontal Scaling

High-traffic applications

  • Millions of users
  • Unpredictable traffic spikes
  • Need to handle Black Friday-level load

Need high availability

  • Downtime costs serious money
  • SLAs require 99.99% uptime
  • Can't afford single points of failure

Stateless workloads

  • Web servers
  • API servers
  • Microservices

Cloud-native applications

  • Built from the ground up to be distributed
  • Containerized (Docker/Kubernetes)

Example

Scenario: Your API is handling 5,000 req/sec, and you need to scale to 25,000 req/sec.

Horizontal scaling solution:

Current: 1 server handling 5,000 req/sec
Add: 4 more identical servers
Add: Load balancer to distribute traffic

Result: 5 servers x 5,000 req/sec = 25,000 req/sec total
Can add more servers as needed
If 1 server fails, still have 20,000 req/sec capacity

API Gateway / Load Balancer: The Traffic Cop

When you scale horizontally, you need something to distribute traffic across all your servers. Enter: the load balancer (or API gateway).

What It Does

The load balancer sits in front of your servers:

Client goes to Load Balancer which routes to:
  - Server 1
  - Server 2
  - Server 3
  - Server 4
  - Server 5

Key Functions

1. Request routing Sends each request to an available server.

2. Load distribution Spreads traffic evenly so no single server gets overwhelmed.

Common strategies:

  • Round-robin: Server 1, Server 2, Server 3, repeat
  • Least connections: Send to the server with fewest active connections
  • IP hash: Same user always goes to same server (for sticky sessions)

3. Health checks Monitors servers and stops sending traffic to dead ones.

4. SSL termination Handles HTTPS so your backend servers don't have to.

5. Authentication and rate limiting (API Gateway) Centralizes security logic instead of duplicating it across servers.

API Gateway vs Load Balancer

Load Balancer:

  • Simpler, just routes traffic
  • Layer 4 (TCP) or Layer 7 (HTTP)
  • Examples: NGINX, HAProxy, AWS ALB

API Gateway:

  • Does everything a load balancer does PLUS:
  • Request/response transformation
  • API composition (calling multiple services)
  • Caching
  • Examples: Kong, AWS API Gateway, Azure API Management

Example

E-commerce site with 10,000 req/min:

API Gateway receives: 10,000 req/min
Distributes to: 5 backend servers
Each server handles: ~2,000 req/min

If one server fails:
- Gateway detects failure (health check)
- Stops sending traffic to that server
- Remaining 4 servers now handle ~2,500 req/min each
- App stays online

Quick Comparison Table

AspectVertical ScalingHorizontal Scaling
What you doUpgrade one serverAdd more servers
ComplexityLowHigh
Cost at scaleExpensiveMore cost-effective
Fault toleranceSingle point of failureHigh (redundancy)
Scaling limitHardware ceilingPractically unlimited
Time to implementMinutes to hoursDays to weeks
DowntimeOften requiredZero-downtime possible
Best forLegacy apps, quick fixesModern distributed apps

The Hybrid Approach (Real Talk)

Here's the secret: Most real systems use BOTH.

Common pattern:

  1. Start with vertical scaling (it's faster and simpler)
  2. Once you hit hardware limits, switch to horizontal
  3. Continue scaling horizontally as needed

Example architecture:

  • Web tier: Horizontally scaled (easy to distribute)
  • Database: Vertically scaled (harder to distribute)
  • Cache layer: Horizontally scaled (Redis cluster)

Don't fall into the trap of thinking you have to choose one forever.


Common Interview Mistakes

Mistake 1: "We'll just scale horizontally for everything"

Wrong. Some things (like traditional databases) are harder to scale horizontally. Know when vertical makes more sense.

Mistake 2: Not mentioning trade-offs

Don't just say "we'll add a load balancer." Explain that this adds complexity but gives you redundancy and unlimited scaling.

Mistake 3: Forgetting about stateful components

"We'll horizontally scale by adding 10 servers!"

But wait - where's the session data stored? What about file uploads? Not everything can be stateless.

Mistake 4: Ignoring costs

Saying "we'll just add 100 servers" without acknowledging the cost implications makes you sound inexperienced.


How to Talk About Scaling in Interviews

Bad answer: "We'll use horizontal scaling because it's better."

Good answer: "Initially, we can vertically scale the web server since we're only at 30% CPU usage. Once we hit the hardware limits around 100k users, we'll transition to horizontal scaling by adding a load balancer and deploying multiple instances of our stateless web tier."

Great answer: "I'd use a hybrid approach. The web tier will be horizontally scaled behind a load balancer since it's stateless and easy to distribute - this gives us redundancy and practically unlimited capacity. For the database, I'd start with vertical scaling since managing a distributed database adds significant complexity. Once we hit database limits, we could look at read replicas for horizontal read scaling, but keep writes vertical for consistency."


Decision Framework

Ask yourself these questions:

QuestionAnswerRecommendation
How much time do I have?Hours/daysVertical
Weeks/monthsHorizontal
What's my budget?TightVertical (short-term)
FlexibleHorizontal (long-term)
Can my app handle distributed architecture?No (legacy)Vertical
Yes (modern)Horizontal
How important is redundancy?CriticalHorizontal
Can tolerate downtimeVertical
What's my growth trajectory?ModestVertical
ExponentialHorizontal

Summary: What to Remember

Vertical Scaling (Scale Up):

  • Make one server more powerful
  • Simple but has hardware limits
  • Good for quick wins and legacy systems
  • Single point of failure

Horizontal Scaling (Scale Out):

  • Add more servers
  • Complex but practically unlimited
  • Need load balancer/API gateway
  • Fault-tolerant and cost-effective at scale

The load balancer:

  • Distributes traffic across servers
  • Enables zero-downtime scaling
  • Provides redundancy through health checks
  • Essential for horizontal scaling

The reality: Most systems use both. Start simple (vertical), scale out when needed (horizontal).

Interview golden rule:

Don't just name a strategy - explain the trade-offs
and justify your choice based on requirements.

Now go scale some systems like a pro.