#Introduction
You are designing an online judge. A user submits:
while true; do fork(); done
The product requirement says "run user code." The security requirement says "do not let random internet code take over the host."
That is the real problem. The judge is not just a queue and a worker. It is a controlled execution environment for untrusted code, with strict limits on CPU, memory, filesystem access, network access, output size, and runtime.
This article pairs with Async Job Worker Pools and the Online Judge solution. If you want to practice the full design, start with the Online Judge practice problem.
#Threat Model
Assume submitted code is hostile.
It may try to:
- read environment variables
- scan the internal network
- write huge files
- fork too many processes
- use too much memory
- run forever
- exploit the kernel or runtime
- leak test cases through logs or side channels
The system should fail closed. If the sandbox cannot be created safely, reject or delay the submission instead of running it on a shared worker host. That failure path should be modeled as part of the broader async execution pipeline, not hidden inside a best-effort worker script.
#Sandbox Layers
A practical design uses multiple layers:
- container or microVM per submission
- read-only base image
- mounted working directory with size limits
- no outbound network by default
- CPU and memory cgroups
- process count limits
- seccomp profile or runtime syscall filters
- wall-clock timeout
- output byte limit
Containers are fast and convenient, but they share the host kernel. MicroVMs such as Firecracker add a stronger boundary at the cost of more startup overhead. In an interview, make the tradeoff explicit:
| Isolation model | Benefit | Cost |
|---|---|---|
| Process sandbox | fastest startup | weakest isolation |
| Container | mature tooling, fast pools | shared kernel risk |
| MicroVM | stronger isolation | higher startup and orchestration cost |
For a serious online judge, warm microVM or container pools are a strong default. Warm pools are also where scaling, capacity reservations, and per-language isolation become concrete instead of abstract.
#Runtime Controls
Each run should have a signed execution spec:
{
"submissionId": "sub_123",
"language": "python3",
"image": "judge-python:3.12",
"cpuMillis": 2000,
"memoryMb": 256,
"wallClockMillis": 5000,
"network": "disabled",
"maxOutputBytes": 65536
}
The worker should compile and execute inside the sandbox, collect stdout/stderr, classify the result, and destroy the environment. The verdict should be persisted durably, then pushed to the user over WebSockets or a similar realtime channel.
Common verdicts include:
- accepted
- wrong answer
- compile error
- runtime error
- time limit exceeded
- memory limit exceeded
- output limit exceeded
- internal error
Do not store raw logs without size limits. A malicious submission can generate gigabytes of output.
#Common Interview Mistakes
Mistake 1: Running code directly on the worker.
Workers are orchestration processes. The submitted program should run in a sandboxed child environment.
Mistake 2: Saying "Docker" and stopping there.
Docker is not a complete security answer. Mention cgroups, namespaces, seccomp, read-only filesystems, network isolation, and timeouts.
Mistake 3: Forgetting warm pools.
Creating a fresh VM per test case can blow the latency budget. Keep warm capacity for common languages.
Mistake 4: Trusting language runtimes.
Python, JavaScript, Java, and C++ all need external resource limits. Language-level timeouts are not enough.
#Summary: What to Remember
Sandboxed code execution is defense in depth.
For an online judge, model submitted code as hostile. Run it in isolated containers or microVMs, disable network access, enforce CPU and memory limits, cap output, and destroy the environment after execution. Use warm pools when latency matters, but do not trade away the isolation boundary.
Related articles: Async Job Worker Pools, Message Queues, WebSockets & Real-Time Communication, and Design an Online Judge.