Sandboxed Code Execution

#Introduction

You are designing an online judge. A user submits:

while true; do fork(); done

The product requirement says "run user code." The security requirement says "do not let random internet code take over the host."

That is the real problem. The judge is not just a queue and a worker. It is a controlled execution environment for untrusted code, with strict limits on CPU, memory, filesystem access, network access, output size, and runtime.

This article pairs with Async Job Worker Pools and the Online Judge solution. If you want to practice the full design, start with the Online Judge practice problem.

#Threat Model

Assume submitted code is hostile.

It may try to:

read environment variables
scan the internal network
write huge files
fork too many processes
use too much memory
run forever
exploit the kernel or runtime
leak test cases through logs or side channels

The system should fail closed. If the sandbox cannot be created safely, reject or delay the submission instead of running it on a shared worker host. That failure path should be modeled as part of the broader async execution pipeline, not hidden inside a best-effort worker script.

#Sandbox Layers

A practical design uses multiple layers:

container or microVM per submission
read-only base image
mounted working directory with size limits
no outbound network by default
CPU and memory cgroups
process count limits
seccomp profile or runtime syscall filters
wall-clock timeout
output byte limit

Containers are fast and convenient, but they share the host kernel. MicroVMs such as Firecracker add a stronger boundary at the cost of more startup overhead. In an interview, make the tradeoff explicit:

Isolation model	Benefit	Cost
Process sandbox	fastest startup	weakest isolation
Container	mature tooling, fast pools	shared kernel risk
MicroVM	stronger isolation	higher startup and orchestration cost

For a serious online judge, warm microVM or container pools are a strong default. Warm pools are also where scaling, capacity reservations, and per-language isolation become concrete instead of abstract.

#Runtime Controls

Each run should have a signed execution spec:

{
  "submissionId": "sub_123",
  "language": "python3",
  "image": "judge-python:3.12",
  "cpuMillis": 2000,
  "memoryMb": 256,
  "wallClockMillis": 5000,
  "network": "disabled",
  "maxOutputBytes": 65536
}

The worker should compile and execute inside the sandbox, collect stdout/stderr, classify the result, and destroy the environment. The verdict should be persisted durably, then pushed to the user over WebSockets or a similar realtime channel.

Common verdicts include:

accepted
wrong answer
compile error
runtime error
time limit exceeded
memory limit exceeded
output limit exceeded
internal error

Do not store raw logs without size limits. A malicious submission can generate gigabytes of output.

#Common Interview Mistakes

Mistake 1: Running code directly on the worker.

Workers are orchestration processes. The submitted program should run in a sandboxed child environment.

Mistake 2: Saying "Docker" and stopping there.

Docker is not a complete security answer. Mention cgroups, namespaces, seccomp, read-only filesystems, network isolation, and timeouts.

Mistake 3: Forgetting warm pools.

Creating a fresh VM per test case can blow the latency budget. Keep warm capacity for common languages.

Mistake 4: Trusting language runtimes.

Python, JavaScript, Java, and C++ all need external resource limits. Language-level timeouts are not enough.

#Summary: What to Remember

Sandboxed code execution is defense in depth.

For an online judge, model submitted code as hostile. Run it in isolated containers or microVMs, disable network access, enforce CPU and memory limits, cap output, and destroy the environment after execution. Use warm pools when latency matters, but do not trade away the isolation boundary.