Sandboxing II: Containers, Postgres & Production Isolation
Goal: climb the isolation ladder from Section 15. Process limits cap CPU and memory but leave the filesystem and network open. A container closes that gap; a locked-down Postgres role does the same for SQL; and an audit log records every execution. You’ll also learn where the stronger tiers — gVisor and Firecracker — fit.
Where this fits: this is the “production” tier of the sandboxing you started in Section 15. Tools (Section 13–14) act; portable limits (Section 15) contain runaway resource use; here we contain what code can touch. Everything an agent (Section 22) runs unattended should sit behind one of these.
Mindset: match the box to the threat. Untrusted arithmetic needs a parser; untrusted code needs a container; untrusted SQL needs a read-only, time-boxed role. Over-building wastes effort; under-building is the breach.
Run untrusted code in a hardened container
The portable sandbox couldn’t really block the network or the filesystem. A container can.
We drive the docker CLI directly (no Python Docker library) and lock it down hard.
The whole lesson is in the flags. work/docker_exec.py:
import shutil
import subprocess
IMAGE = "python:3.12-slim"
def build_command(code: str, name: str) -> list[str]:
return [
"docker", "run", "--rm", # ephemeral: deleted when it exits
"--name", name, # a handle so we can kill it on timeout
"--network", "none", # no network at all
"--read-only", # read-only root filesystem
"--cap-drop", "ALL", # drop every Linux capability
"--security-opt", "no-new-privileges",
"--pids-limit", "64", # bound the number of processes
"--memory", "256m", # bound RAM
"--cpus", "1.0", # bound CPU so a busy loop can't hog a core
"--user", "65534:65534", # run as 'nobody', never root
IMAGE,
"python", "-I", "-c", code,
]
def run_in_container(code: str, name: str, timeout: float = 60.0):
cmd = build_command(code, name)
if shutil.which("docker") is None:
print("Docker not found -- skipping. Command would be:")
print(" " + " ".join(cmd[:-1]) + f" {code!r}")
return None
try:
proc = subprocess.run(cmd, capture_output=True, text=True,
timeout=timeout, check=False)
except subprocess.TimeoutExpired:
# The timeout killed our `docker run` client -- but the CONTAINER keeps
# running. Stop it by name, then force-remove it (killing a --rm container
# can leave a "Dead" husk), or it leaks past our process.
subprocess.run(["docker", "kill", name], capture_output=True, check=False)
subprocess.run(["docker", "rm", "-f", name], capture_output=True, check=False)
return None
return proc.returncode
Run normal code and it works; run code that opens a URL and it fails with a
name-resolution error — because --network none means there is no network to reach.
Each defense is one flag, and they stack: no network, no writable disk, no capabilities,
no root, bounded CPU/RAM/processes, and the whole thing vanishes on exit. The timeout
branch matters: killing the docker run client doesn’t stop the container, so you must
docker kill (then docker rm -f) it by name or it outlives your program.
The model can’t escape what isn’t there.
--read-only+--network none+--cap-drop ALLremoves the capability to exfiltrate or persist, so even a perfect prompt-injection has nowhere to go. Isolation beats detection.
When you need more: gVisor and Firecracker
Containers share the host kernel — a kernel bug can be an escape. When the threat model demands a harder boundary:
- gVisor (
runsc) — a user-space kernel that intercepts syscalls, so untrusted code never talks to the host kernel directly. Drop-in as a Docker runtime (docker run --runtime=runsc …). This is what OpenAI’s and Google’s hosted code execution use. - Firecracker — lightweight microVMs with a real (but minimal) guest kernel, booting in ~100 ms. Per-task VM isolation; it’s what AWS Lambda runs on.
You rarely build these yourself — you reach for a host or runtime that provides them. The point for this course: know the names and the trade-off (stronger isolation, more moving parts) so you can choose deliberately.
Untrusted SQL: a locked-down Postgres (opt-in)
The team runs Postgres, so here’s the database version of the sandbox: let SQL run without letting it read everything or change anything. Two halves —
The role (an admin creates it once). It can do almost nothing by default:
CREATE ROLE sandbox NOSUPERUSER NOCREATEDB NOCREATEROLE NOINHERIT LOGIN;
REVOKE ALL ON ALL TABLES IN SCHEMA public FROM sandbox;
GRANT SELECT ON public.products TO sandbox; -- only what it truly needs
The session guards any app applies on every untrusted query — a READ ONLY
transaction, a short statement_timeout, bound parameters (never string-formatted
SQL), and a row cap. work/pg_sandbox.py:
import os
def run_guarded(conn, sql: str, params: tuple):
with conn.transaction():
with conn.cursor() as cur:
cur.execute("SET TRANSACTION READ ONLY") # no writes, no DDL
cur.execute("SET LOCAL statement_timeout = '2s'") # can't hog the server
cur.execute(sql, params) # params are bound, not formatted
return cur.fetchmany(100) # cap rows into memory
Inside READ ONLY, any INSERT/UPDATE/CREATE is rejected by Postgres itself — defense
that doesn’t depend on you remembering to check. String-formatting a value into the SQL
(f"… WHERE id = {x}") throws all of this away: that’s SQL injection, the database cousin
of the prompt injection in Section 20. Always pass parameters.
Opt-in: the runnable
examples/16/pg_sandbox.pyonly does anything if you setDATABASE_URLto a throwaway dev database you control (and havepsycopginstalled). Skip this part otherwise — withoutDATABASE_URLthe script prints a notice and exits cleanly, and the rest of the section stands on its own.
Audit every execution
Isolation stops damage; an audit log lets you see what was attempted. Emit one structured JSON line per execution — what ran, the allow/deny decision, the exit code, the duration. But a command can carry secrets (a token in an argument, a password in a connection string), so don’t log it raw: store a truncated preview plus a hash, not the full payload.
import hashlib
import json
def emit_audit(*, tool, command, decision, exit_code, duration_ms):
print(json.dumps({
"event": "sandbox.exec",
"tool": tool,
"command_preview": command[:80], # truncated, not full
"command_sha256": hashlib.sha256(command.encode()).hexdigest()[:16], # fingerprint
"decision": decision, "exit_code": exit_code, "duration_ms": duration_ms,
}, separators=(",", ":")))
The preview is enough for a human to recognize a command; the hash lets you group repeats —
without ever writing the secret to a log store that’s widely readable. JSON lines ship
straight into the Elastic stack — Filebeat tails the file, or you POST to
Elasticsearch’s _bulk endpoint — the same observability discipline from Section 9, now
pointed at security events. The full emitter is
examples/16/audit_log.py
.
Challenges
- Prove the network is gone. Run code in the container that fetches a URL; confirm it
fails. Then drop
--network noneand watch it succeed — so you see what the flag buys. Success: you can toggle the network on and off with one flag. - Break out of read-only. Against a dev Postgres, run an
INSERTthroughrun_guardedand confirm Postgres rejects it. Then try a string-formatted value and a parameterized one with input like1; DROP TABLE products— show only the string-formatted path is dangerous. Success: you can demonstrate the injection and the fix. - Wire the audit log. Make
run_command(Section 15) andrun_in_containerboth emit an audit record, thengrep/jqthe output for every"decision":"deny". Success: a one-command view of everything that was blocked.
Recap
- Containers add the filesystem/network isolation process limits can’t:
--network none,--read-only,--cap-drop ALL, non-root, bounded RAM/PIDs, ephemeral. - gVisor (user-space kernel) and Firecracker (microVMs) are the stronger tiers hosted tools use — reach for them when a shared kernel isn’t good enough.
- Untrusted SQL: a least-privilege role +
READ ONLY+statement_timeout+ bound parameters + row caps. String-formatted SQL is injection. - Audit every execution as metadata-only JSON; ship it to the Elastic stack (Section 9).
Next
Section 17 — Model Context Protocol (MCP): with tools defined (13–14) and their execution isolated (15–16), the next question is how to expose and consume tools as a standard — so a server of tools can be shared across apps. That’s MCP.