the run that outlives the REQUEST
Here is where every agent demo breaks. You ask for something that takes ten minutes — audit a repo, rewrite a directory, work a long plan — and the request/response model has no answer for you. The HTTP connection times out somewhere around thirty seconds. So you hold the socket open and pray the proxy, the load balancer, and the laptop lid all cooperate for ten straight minutes. They don't.
The escape hatch everyone reaches for is the same one: a job queue, a worker pool, a status table, a websocket relay. You rebuild Sidekiq around every agent — the identical plumbing, every project, just to answer the question did it finish yet? The parent lesson sold agents as workers hired for the long run. This page answers the unglamorous operational question underneath that pitch: what process, exactly, is the agent — and who keeps it alive after you hang up?
spawning, DEFINED
1. starting a run as a supervised, named process: one call returns an id in milliseconds; the run works for minutes under a supervisor, addressable by that id from anywhere — including from itself.
Nothing here is a job framework bolted on. The supervisor, the named registry, the process-per-run — these are the BEAM's own primitives, the same ones that have run phone networks for thirty years. An agent run isn't a row in a queue table. It's a process with a supervisor watching it.
one call, an id, a 202
You start a run with one HTTP call. POST /api/run takes a
body, mints an id, and returns before the run does any work:
$ curl -s -X POST $WB_RUNTIME_URL/api/run \
-d '{"system":"You are a careful, capable agent.",
"task":"audit blog/ for dead links and write report.org",
"max_steps":40}'
{"id":"run-1742","status":"running"} ← HTTP 202, milliseconds later
The id is run-<integer> — minted from a process-unique
counter, not a UUID. The 202 is the whole point: it means
accepted, working on it, not done. Your caller is free the
instant it has the id. Three knobs ride in the body:
max_steps— the loop's ceiling. At this seam the default is 40. The raw agent default underneath is 12; standing workers tick at 60. Different front doors, different budgets.model— which model drives the loop. Optional; falls through to the engine default.exec— not a parameter so much as a trust grant. It unlocks host-brokered git, publish, image, and OS-workdir tools — never raw bash; that hatch was deleted on purpose. It's honored only when the desktop says so orWB_AGENT_EXEC=1is set in the environment. Ask for it without the grant and you simply don't get those tools.
The CLI is the same call wearing a friendlier face:
wbx agent run "audit blog/ for dead links" --model openrouter/…
posts to this endpoint and prints the id. There is no second code path — the
CLI is a client of /api/run, exactly like your curl.
sequenceDiagram participant C as caller (curl / wbx) participant W as web.ex participant S as AgentSession.Sup
(DynamicSupervisor) participant R as the session process C->>W: POST /api/run {system, task, max_steps} W->>W: mint id run-1742 W->>S: start_child(session for run-1742) S->>R: spawn + register by id W-->>C: 202 {"id":"run-1742","status":"running"} Note over R: the run hasn't started yet —
the 202 is already on its way back R->>R: NOW the work begins
the receptionist and the WORKER
Here is the load-bearing trick. A run is not one process — it's two, and the split is what makes everything else trivial.
The first process is a supervised GenServer, the
AgentSession. Think of it as a receptionist. It's started under
a DynamicSupervisor, registered by its id in a registry, and its
entire job is to stay responsive. It answers what's your
status?, subscribe me to updates, here's a human review
— instantly, always, because it never does the slow work itself.
The slow work belongs to a second, unnamed process. When the
session boots, its handle_continue spawns a plain child process
that calls Agent.run and grinds through the model-and-tools loop
for however many minutes it takes. The receptionist holds a handle to it and
keeps taking calls. That's why GET /api/run/:id never blocks
waiting on the model: you're talking to the receptionist, not the worker. The
worker could be ten seconds into a slow tool call and the status answer still
returns instantly.
Lookup is by registry. Hand any of these APIs an id that isn't registered
and you get a clean :not_found back — no crash, no guessing. And
both the registry and the supervisor are permanent children of the
application's supervision tree, so the spawning machinery is up and waiting
before any HTTP listener accepts a connection.
flowchart TD app[["Application supervisor"]] app --> reg["AgentSession.Registry
id → pid"] app --> sup["AgentSession.Sup
DynamicSupervisor"] sup --> s1["session run-1742
(receptionist · GenServer)"] sup --> s2["session run-1743
(receptionist · GenServer)"] s1 -. "spawn (unlinked)" .-> w1["worker — Agent.run loop
(does the minutes of work)"] s2 -. "spawn (unlinked)" .-> w2["worker — Agent.run loop"] style app fill:#9fc4e8,stroke:#121316,stroke-width:2.5px style s1 fill:#13d943,stroke:#121316,stroke-width:2px style w1 fill:#ffffff,stroke:#121316
When the worker finishes, it sends the session a done message. The
session's status flips from :running to :done and
every subscriber gets the result. Until then, the session's state carries
everything anyone could ask for: its id, status, the run, the live event
list, its subscribers, and any pending reviews.
every step, three SINKS
Each time the loop calls a tool, it produces a step event — one consistent shape, the same one the whole engine speaks:
%{step, agent, tool, args, output, exit_code, error, dur_ms, ts}
That single event fans out to three different places, each with its own job and its own truncation budget. The truncation isn't sloppiness — it's a deliberate gradient, generous where the data is durable and tight where it's ephemeral:
| sink | what it's for | output limit | survives engine restart? |
|---|---|---|---|
| on_step → WebSocket | live UI fanout to every subscriber | 500 chars / frame | no — dies with the socket |
_steps.jsonl | the append-only ledger, written regardless of any caller | 200 chars / line | yes — it's a file |
events.org | the readable run transcript, rendered on finish | 300 chars / step | only if the VFS was file-backed |
The middle row is the important one. _steps.jsonl is written
on every step regardless of any caller-supplied on_step — so nothing
escapes by construction. You can ignore the websocket entirely and the
provenance trail still exists on disk. The full event holds up to 4000 chars
of output; what each sink keeps is a deliberate slice of that. Generous at the
source, tight at the edges.
One more guard lives at this layer: every tool call is wall-clock bounded at 150 seconds. A tool that wedges gets shut down and becomes a tool error the model sees — never a stalled run. The loop keeps going; the model decides what to do about the failure.
poll or STREAM
Because the receptionist was accumulating every step all along, watching a run is just a registry lookup. Two ways to do it.
Poll with GET /api/run/:id. Mid-run you get a running
snapshot; minutes later, a finished one:
$ curl -s $WB_RUNTIME_URL/api/run/run-1742 ← mid-run
{"status":"running","steps":7,"live":[…],"reviews":[]}
$ curl -s $WB_RUNTIME_URL/api/run/run-1742 ← minutes later
{"status":"done","steps":12,"result":"…",
"tools":["shell","fetch","vfs_write","done"],
"events_org":"* Agent run :session:\n** step 0: shell :tool_call:…",
"reviews":[]}
The done payload carries the distinct tool names used, the result, and the
whole events.org transcript inline. One honest quirk to script
around: an unknown id here returns HTTP 200 with
{"error":"no such run"} — not a 404. Check the body, not just the
status code.
Stream with GET /api/run/:id/stream, a WebSocket
upgrade with a 10-minute idle timeout. On connect it subscribes you, then
pushes a frame per step, then a final done frame:
{"type":"subscribed","id":"run-1742"}
{"type":"step","step":0,"tool":"shell","output":"blog/2026-05-01.org\nblog/…"}
{"type":"step","step":1,"tool":"fetch","output":"fetch failed: HTTP 404"}
{"type":"done","result":"3 dead links found; report.org written"}
Those output values are sliced to 500 chars per frame — enough
to watch the run think, not the full payload. A bad id over the socket gets
{"type":"error","error":"no such run"}.
sequenceDiagram
participant U as UI (WS client)
participant A as AgentStream
participant S as session run-1742
U->>A: GET /api/run/run-1742/stream (upgrade)
A->>S: subscribe
A-->>U: {"type":"subscribed","id":"run-1742"}
S-->>A: step 0 (shell)
A-->>U: {"type":"step","step":0,"tool":"shell", …}
S-->>A: step 1 (fetch)
A-->>U: {"type":"step","step":1,"tool":"fetch", …}
S-->>A: run done
A-->>U: {"type":"done","result":"3 dead links found …"}
The rule of thumb: a script that wants a final answer polls; a UI that wants to show the run thinking streams. Same data, two doors.
the run that knows its NAME
Here's the move that turns spawning into a building block instead of a
convenience. When the runtime spawns a run, it injects WB_RUN=<run
id> into the agent's environment. The run knows its own name.
That sounds small until you see what it enables: a run can build a URL that
routes back into its own mailbox. A connect URL like
…/api/ctk/commit?run=$WB_RUN lets the run hand a human a place to
send a decision, then call wb ctk await $WB_RUN and block until
it arrives. The review lands via POST /api/ctk/commit?run=<id>,
gets pushed live to subscribers and queued FIFO, and the agent polls for it —
a 204 when there's nothing yet. Reviews even persist to a per-run JSONL file,
so the decision record survives the process. This is the primitive under
human-in-the-loop, and it has its own deep dive.
→ The full review loop is the human-in-the-loop lesson. This is just the seam it stands on: a run that can address itself.
the other spawner: born at BOOT
Everything so far described runs born on demand — one
POST /api/run, one process under the DynamicSupervisor. There's a
second spawner with the same engine inside. Standing agents are born at boot
from a declared manifest, under a static supervisor.
You declare them in a small org manifest. Headings are agent names; properties point at definitions and tune cadence:
#+TITLE: crew * wren :PROPERTIES: :DEF: /data/agents/writer.org :LIFECYCLE: /data/lifecycles/writer.org :INTERVAL: 10m :END: * moss :PROPERTIES: :DEF: /data/agents/editor.org :END:
Point WB_CREW_DEF at that file and the supervisor starts one
worker per heading. Walk the math the manifest implies: wren first
ticks at boot-grace plus zero; moss starts staggered by
i × WB_CREW_STAGGER_MS — 30 seconds later by default — so two
agents don't slam the engine in the same instant. Wren has a 10-minute
interval; moss declared none, so it defaults to once an hour. Both runs queue
on a global Gate that caps concurrency at 2; a worker acquires
before its run and releases in an after block, so a wedged run
can't starve its peers. Each run executes in a linked task killed at
15 minutes wall clock. And cadence survives restarts: a
keeper-last-run-wren file under WB_DATA means a
reboot doesn't reset the clock to zero.
| ad-hoc run | standing agent | |
|---|---|---|
| spawner | DynamicSupervisor | static Supervisor (one per member) |
| born | on demand — a POST | at boot, staggered (i × 30s) |
| trigger | POST /api/run / wbx agent run | interval cadence from the manifest |
| concurrency | unbounded by design | Gate — max 2 at once |
| per-run clock | max_steps + 150s/tool | 15-minute wall clock per tick |
| interior | the same Agent.run loop, either way | |
The punchline is the last row. The keeper tick goes through the same path
as /api/run — same loop, same event shape, same gates — just with
its workdir set to a tenant repo and the exec grant on. Idle agents back off
too: a streak of no-work runs grows the gap exponentially, capped at 30
minutes, and a single completed run snaps it back. The orchestration
and fleets lessons live in this manifest in full; this is
the trailhead.
bounded at every LAYER
A run could hang in a dozen places — a tool that never returns, a loop that never converges, a socket nobody closes. The design's answer is a ladder of time bounds, each one converting a hang into a signal the next layer up can act on:
| bound | value | who enforces it | what a hang becomes |
|---|---|---|---|
| tool call | 150 s | the agent loop | a tool error the model sees |
| loop length | max_steps (40 / 12 / 60) | the run itself | a clean finish, not an infinite loop |
| keeper tick | 15 min | the standing worker | a killed task, peers freed |
| WS idle | 10 min | the stream socket | a closed connection, run untouched |
| no-work backoff | → 30 min cap | the standing worker | an idle agent that stops burning calls |
Read the table's verdict in one line: every bound turns a stall into a fact something else can handle. A wedged tool doesn't stall the run — it becomes an error the model routes around. A streamed socket going idle doesn't keep the run hostage — it just closes, and the run, which never cared about your socket, keeps working. These aren't safety nets bolted on after the fact. They're the shape of the thing.
what dies, what SURVIVES
The honest limits, stated plainly, because they decide how you script against this.
By default a run leaves nothing behind. The spawned loop opens its
own VFS, and the default is :memory:. The moduledoc's promise of
a resumable, replicable run is true only when the caller passes a
durable :vfs path. No path, no residue from the VFS.
Engine restart kills every in-flight run. Sessions are supervised
children, not persisted across a BEAM restart. What survives a restart is the
durable residue: _steps.jsonl, the per-run review JSONL, and
events.org if the VFS was file-backed. The live sessions
themselves are gone.
A hard-crashed worker leaves a zombie. The worker process is a plain
spawn — unlinked, unmonitored. If it crashes outright (as opposed to an LLM
error, which the loop converts into a clean error: … finish), the
session never hears the done message and shows status: running
indefinitely. We didn't find a sweeper that reaps these, so treat a run stuck
on running past your timeout budget as suspect, not as still-working.
A session crash re-runs, it doesn't resume. The session GenServer
uses OTP's default permanent restart. If the session itself crashes,
the supervisor restarts it with the same id and task — and
handle_continue runs the whole task again from step zero.
Subscribers and live history are lost. This is inferred from the OTP defaults,
not an explicit choice in the code, but it's the behavior to expect.
None of this is hidden. The truncation tiers are real, the 200-not-404 quirk is real, and the durable trail is exactly three files. Build on what's durable; don't assume the live process is.
questions people actually ASK
Does the run survive my laptop closing?
Yes — if the engine is somewhere else (your deployed runtime on Fly, say). The caller is disposable by design; closing your laptop just drops the client that held the id. Reconnect later and poll the id. If the engine is your laptop, then closing it stops the engine, and the run with it.
Can I cancel a run?
Honestly: we found no kill endpoint in the surface this lesson covers. The
brakes are max_steps and the time bounds — 150s per tool, the
loop ceiling, the 15-minute keeper clock. If you need a hard cancel, verify
the current API before relying on one; don't assume it exists because it
feels like it should.
How many runs can go at once?
Ad-hoc runs are unbounded by design — each POST /api/run is
its own supervised process, and the DynamicSupervisor doesn't cap them.
Standing agents are different: the Gate caps them at 2 concurrent by default
(WB_CREW_MAX_CONCURRENT), precisely so a fleet of them doesn't
overwhelm the engine.
Why did my run say running forever?
Almost certainly the zombie case: the unlinked worker crashed hard, so the session never got its done message. An LLM error wouldn't do this — those get converted to a clean finish. A genuine process crash leaves the status stuck. Past your timeout budget, treat running as suspect.
Should I poll or stream?
Poll from a script that wants the final answer — one
GET /api/run/:id when you expect it's done, and read the result
and tools off the payload. Stream from a UI that wants to show the run
thinking, frame by frame. Both read the same session; the choice is about
your client, not the run.
Is this a job queue under the hood?
No — and the distinction matters. There's no queue table, no worker pool you provision, no relay you stand up. It's a supervised process per run, named in a registry, on the BEAM. The thing that runs phone networks runs your agent.
keep GOING
Spawning is the mechanics under the parent lesson. From here, go inward to what the process actually does, or outward to who calls spawn on a cadence.