Design an Online Judge

#Introduction

The interviewer says: "Design LeetCode."

A weak answer starts with problems, submissions, and a database. A stronger answer starts with the risk: users submit untrusted code, the system must execute it safely, and contest users expect feedback quickly.

An online judge is an async execution pipeline with a secure sandbox and a real-time results path.

Ready to practice? Try the Online Judge practice problem and build this system step-by-step with AI-guided feedback.

Related concepts for this design: Message Queues, Redis Sorted Sets, Scaling, Idempotency & Deduplication, and System Design Structure.

#Functional Requirements

1. Submit and execute code

Users submit source code for a problem and language
The platform compiles or interprets the code
The judge runs hidden and sample test cases
The system returns a verdict

Execution should use async job worker pools. The API accepts the submission and returns quickly. Workers compile and run the code later.

2. Competition leaderboard

Contests have many simultaneous submissions
Users need rank updates quickly
Ranking may depend on solved count, penalty time, or score

Use Redis Sorted Sets or a similar ranked serving structure. Do not run full SQL aggregates for every leaderboard refresh.

#Non-Functional Requirements

Secure sandboxing

User code is hostile by default. Run it in a sandboxed execution environment with CPU, memory, process, output, filesystem, and network limits.

Feedback latency

Most submissions should reach a verdict within a few seconds. Use warm worker pools for common languages, prioritize contest queues, and push status over WebSockets.

Reliability

Persist submissions before enqueueing execution jobs. If a worker crashes, the job should be reclaimed or marked as an infrastructure failure. This is where message visibility timeouts and idempotent consumers matter.

Fairness

Contest traffic should not be starved by bulk practice submissions. Use separate queues and quotas.

#API Design

Create submission

POST /api/v1/submissions

Request:

{
  "problemId": "two-sum",
  "language": "python3",
  "sourceCode": "print('hello')",
  "contestId": "weekly-451"
}

Response:

{
  "submissionId": "sub_123",
  "status": "queued"
}

Get submission result

GET /api/v1/submissions/sub_123

Response:

{
  "submissionId": "sub_123",
  "status": "accepted",
  "runtimeMs": 42,
  "memoryKb": 10240
}

Contest leaderboard

GET /api/v1/contests/weekly-451/leaderboard?cursor=0&limit=50

#High Level Design

The API gateway authenticates users and sends submissions to the submission service. The submission service stores the source and metadata, then enqueues an execution job. This follows the same decoupling principle as Async Processing.

Judge workers claim jobs from the queue, start a sandbox, load test cases, run the code, and persist verdicts. Result events update the user's live connection and update the contest leaderboard cache. The ranking side is a direct application of Redis Sorted Sets.

#Detailed Design

Submission storage

Store source code, language, problem id, user id, contest id, status, and timestamps. The source of truth is durable storage, not the queue message.

Execution

Workers should run compile and test steps inside the sandbox. Each test has time and memory limits. The judge records the first failing test class, but hidden test details should not leak.

Leaderboard

On accepted submissions, update contest score in a sorted set:

ZADD contest:weekly-451:rank score userId
ZREVRANGE contest:weekly-451:rank 0 49 WITHSCORES

Persist final contest results to durable storage after the contest or periodically from the serving cache. Redis is the fast serving layer; the durable store remains the source of truth, as in Databases & Caching.

Realtime results

Use WebSockets for submission status changes:

queued -> running -> accepted
queued -> running -> wrong_answer
queued -> running -> time_limit_exceeded

Polling can remain as a fallback.

#Common Interview Mistakes

Running submitted code on the API server
Saying "Docker" without CPU, memory, network, and filesystem limits
Storing test cases inside queue messages
Computing leaderboard ranks with SQL scans during contests
Retrying wrong answers as if they were infrastructure failures