the agent that CAN'T let go
Ask a conversational agent — voice, chat — for a twenty-minute research job, and you hit one of two failure modes. Either the turn freezes while it works, and you sit there listening to silence on a phone call. Or it cheerfully promises to "look into that" and then forgets, because the promise lived in a context window that ended. There is no hand-off seam, no ledger, no way for the work to reach back and say it's done.
And if the agent does spawn background work, that work is usually invisible. It isn't in git, it isn't reviewable, and when it dies it dies silently — no record that it ran, crashed, or timed out. The most plan-hungry collaborators we've ever had arrived, and we let them start errands nobody could see, audit, or hear back from.
This sub-lesson is the seam that fixes both halves. It sits under workflows — that parent lesson taught the grammar a plan is written in. This one teaches who calls the runner, how a goal becomes a plan without the agent writing it, and exactly how the answer finds its way back into the conversation.
the DEFINITION
1. handing a goal across a seam so the conversation never blocks: a second model authors a workflow per request, a supervisor runs it time-bounded on a task ledger, every terminal state queues as feedback, and the whole ledger renders to a git-visible file.
The word is borrowed from a mail room, on purpose. You don't carry the parcel to its destination — you hand it to a system that takes it from there, gives you a tracking number, and lets you walk away. Everything in this lesson is one of those four clauses made concrete, and every clause is proven by a single real run we'll follow the whole way down.
one goal, END to end
Here is the keystone — the full life of one dispatched goal, from a
voice agent's request to a task id, with no human waiting on any of it. The
voice agent says one thing — dispatch(goal) — and the call
returns before the work has even started. Read the lanes left to right:
the agent hands a goal to the /gk router; the router
asks a cheap author model to turn that goal into an org plan; the plan is
validated and persisted; a task lands on the ledger; the runner starts in
the background; and the router returns a task id immediately:
sequenceDiagram participant A as conversational agent participant R as /gk router participant M as author model (Mercury) participant L as task ledger participant T as Workflow.Todo runner A->>R: dispatch(goal) R->>M: turn this goal into an org TODO outline M-->>R: a #+TITLE header, then TODO headlines Note over R: validate — at least one TODO headline R->>R: persist to workflows/the-slug.org R->>L: add task — gk-201 — queued then doing L->>T: spawn_monitor — run the plan in runs/gk-201/ R-->>A: dispatched true, task gk-201, the workflow path Note over A,R: returns in milliseconds — the conversation continues Note over T: minutes later, the run finishes,
queued as feedback for the next turn
The single most important arrow is second from the bottom: the router answers the agent before the runner has done any real work. Dispatch is fire-and-record, not fire-and-wait. The run executes on its own time, and the agent finds out how it went later — which is the next three sections.
four states, no FIFTH
The ledger is a small in-memory process — a GenServer holding tasks,
feedback, and monitors. Every task is a record with an id, the goal, the
plan file, a status, timestamps, and a result. The ids look like
gk-201. And the status moves through exactly four values, no
more. Picture the graph: a task is born queued, becomes
doing the instant the run is spawned, and from there it can only
reach two terminals — done if the run finishes, or blocked if
it crashes or runs out of time. There is no "stuck" and no "unknown":
stateDiagram-v2 [*] --> queued: dispatch queued --> doing: spawn_monitor doing --> done: run finished, summarized doing --> blocked: crashed — abnormal DOWN doing --> blocked: timed out after 35m done --> [*] blocked --> [*]
The two blocked edges are the whole reason this is trustworthy. When the
run is launched, the ledger does a spawn_monitor and wraps the
work in a try/catch, so a clean finish sends itself back as a result. But it
also arms a kill switch: a Process.send_after timer set to
35 minutes. If that timer fires and the task is still doing, the
process is killed, the status becomes blocked, and the result reads
timed out after 35m. And if the worker dies without reporting —
an abnormal monitor signal while doing — the task becomes blocked with the
crash reason captured. A crash or timeout becomes status blocked with the
reason, never a stuck doing. That invariant is the difference between a
ledger you can trust and a board full of zombies.
When a run does finish cleanly, the ledger summarizes it rather than
storing the raw output. A finished workflow reports its tally —
workflow finished: %{"DONE" => 4, "FAILED" => 2} — the
frequency of each leaf's final state. Other results are inspected and sliced
to 300 characters; errors are sliced the same way and mark the task blocked.
The ledger keeps a verdict, not a transcript.
hearing back: the DRAIN
So the run finishes minutes after the agent moved on. How does the answer get back into a conversation that's already three turns ahead? Not with a callback that interrupts mid-sentence — with a queue the agent drains on its own schedule. When a task reaches a terminal state, a feedback note is prepended to a list: the task id, the final status, a short summary, and the original goal. It just sits there until the agent asks.
The agent asks by calling tasks — a natural thing to do at
the top of a turn. That call returns both the current task list and a
drain of the feedback queue, oldest-first, and the drain
clears the queue. Read-once. Picture the timeline: the run finishes
while you and the agent are talking about something else; a note slips onto
the queue; on the agent's next turn it drains the note and relays it to you
in plain language. The loop closes at conversation speed, not interrupt
speed:
sequenceDiagram participant U as you participant A as the agent participant L as the ledger participant T as the run U->>A: ask about something else T->>L: gk-201 finished — note queued Note over L: feedback: [gk-201 done] U->>A: next turn — anything at all A->>L: tasks L-->>A: tasks + drain(feedback) — and the queue clears A->>U: by the way, that E2B research is done
That is the entire loop back into the conversation. No push, no ringing
phone mid-thought — the feedback waits for a natural beat. The cost of that
calm is the read-once rule: the second tasks call returns an
empty feedback list, because the first one already took it. Drain it, relay
it, or it's gone.
a ledger you can read in GIT
Depth rung. The ledger lives in memory — but it also renders itself to a
file, TASKS.org, on every change. This is the same
one-way discipline the workflows lesson drew: the
file is generated from the ledger and never hand-edited. There is no
two-way sync, because there is only one direction. Each ledger field maps to
a piece of the rendering:
| ledger field | how it renders in TASKS.org |
|---|---|
| status | a TODO keyword on the headline — queued→TODO, doing→DOING, done→DONE, blocked→BLOCKED |
| goal | the headline text after the keyword |
| id · file · started · finished | a :PROPERTIES: drawer under the headline |
| result | a plain line below the drawer — the summary or the failure reason |
| (order) | entries sorted newest-first |
The verdict of that table in one sentence: the rendered file carries
every fact the ledger knows, in org's own grammar, so the whole task history
is greppable, diffable, and reviewable in version control. The file even
declares its own keyword set with a #+TODO: line, so any org
reader colours the columns correctly. This is the same move
BOARD.org makes — regenerated from the issue tracker,
one-direction, never hand-edited. Generated files, owned by the generator.
the orphan REWRITE
Here is the honest edge that makes the whole thing trustworthy, and it's
the best detail on the page. The ledger is in-memory. So when the engine
restarts, every task that was doing is gone from the
GenServer — but the last rendered TASKS.org on disk still says
DOING. The file would be lying about what's alive:
claiming a run is in progress when no process exists.
The fix is a render correction, not a sync. On boot, before serving
anything, the renderer rewrites the file in place: any headline that's still
in an active state — DOING or TODO — is rewritten
to BLOCKED, and a comment is injected at the top to say why.
It's not pretending the work resumed; it's refusing to let a generated file
lie. We own that file, so we correct it. This is our real
TASKS.org after the restart that orphaned our run — note the
comment on line two, and the headline that now reads BLOCKED:
#+TITLE: groundskeeper — dispatched tasks # orphaned actives marked BLOCKED on boot #+TODO: TODO DOING BLOCKED | DONE * BLOCKED Research E2B (e2b.dev) and Daytona (daytona.io) pricing: … :PROPERTIES: :ID: gk-201 :FILE: workflows/research-e2b-e2b-dev-and-daytona-daytona-3.org :STARTED: [2026-06-11 04:58:28] :END:
This is the full lifecycle in one screenshot, including the unglamorous
ending. gk-201 wasn't a failure — it was orphaned on restart,
and the file says so plainly rather than leaving a stale DOING to mislead
the next reader. The boundary is exact: the file never claims a liveness it
can't back. That's the same lesson the autopoet
taught about supervised, time-bounded work — blocked-not-stuck, the reason
always recorded.
what a run LEAVES behind
Depth rung. Underneath the ledger, the actual execution is the
workflow runner from the parent lesson — native org
as the state machine, ordered leaves as a pipeline, parallel leaves via
bounded concurrency, :done-when: as the acceptance gate. Each
dispatched run gets its own working directory, runs/<id>/,
and leaves a durable trail in it:
- scratch/ — a persistent shared workspace the leaves read and
write across the run. For our run it holds the real extracted pricing —
scratch/e2b_pricing.orghas per-second numbers, down to fractions of a cent per vCPU-second. - _steps.jsonl — every tool call the leaf agent made, hash-chained and signed when the run is sealed, so the trail can't be quietly edited after the fact.
- _telemetry.db — the run's telemetry, written alongside.
One rule governs the acceptance checks, and it's load-bearing: the
:done-when: commands run in the in-sandbox WASM shell,
never on the host. An author-supplied check is untrusted input, so it must
not become a native-execution vector. The consequence is that the checks
fail closed — if a check command isn't one the WASM shell knows, the
task simply doesn't reach DONE. A plan can't talk its way to success by
naming a command the sandbox refuses to run.
where it BITES
Honesty section. The most vivid limit is the one we flagged earlier.
Mercury wrote a plan full of curl and mkdir; the
WASM shell has neither. Here are two real lines from our run's
_steps.jsonl — the sandbox biting the author:
{"error":"{:unknown_command, \"mkdir\"}","tool":"shell","step":1,
"args":{"pipeline":"mkdir -p scratch && curl -sL -o scratch/daytona_pricing.html …"}}
{"error":"{:unknown_command, \"curl\"}","tool":"shell","step":0,
"args":{"pipeline":"curl -sL -o scratch/e2b_pricing.html https://e2b.dev/pricing"}}
The plan named the wrong toolchain. But watch what happened next: the leaf
agent adapted — it reached for its fetch tool instead, pulled
the pages, and still produced scratch/e2b_pricing.org with real
per-second pricing. The run partially succeeded despite the wrong
instructions, and the gates that couldn't pass simply left their leaves
un-DONE. The plan is reviewable precisely because the author is fallible, and
the :done-when: gates fail closed instead of pretending.
The rest of the bites are quieter, and all real:
- The ledger is in-memory. A restart orphans active tasks (the BLOCKED rewrite mitigates the lie, but the run itself is gone) — and it also drops any feedback that hadn't been drained yet. Undrained feedback dies with the BEAM. The rewrite saves the file's honesty, not the work.
- Feedback is read-once and global. One queue, drained whole, with no per-conversation routing. If two conversations share a ledger, whoever drains first gets the notes.
- Summaries truncate to 300 characters. The ledger keeps a
verdict, not a transcript — for the full story you read the run's
scratch/and_steps.jsonl. - 35 minutes is a wall, not a suggestion. A run that needs longer is killed and marked blocked. Dispatch is for errands, not marathons.
questions people actually ASK
Why doesn't the agent author its own workflows?
Deliberate indirection. The conversational agent hands a goal; a separate author model turns it into a reviewable org plan. That keeps the conversational model focused on talking, makes the plan a file you can inspect before it runs, and is the self-authoring demo — a goal in one voice becoming a plan in another.
What happens at minute thirty-six?
The kill switch. A 35-minute timer is armed when the run starts; if it fires while the task is still doing, the process is killed, the status becomes blocked, and the result records "timed out after 35m". There's no grace period — the bound is a wall by design.
Can I hand-edit TASKS.org?
You can, but the next ledger change clobbers it — the file is rendered from the in-memory ledger on every change, one direction only. It's a generated view, like a board rendered from a plan. Edit the work, not the rendering.
Why org instead of a job queue?
Because the plan reuses the workflow grammar you already have — ordered leaves, dependencies, done-when gates — so a dispatched job is a reviewable, diffable file before it's a process. A job-queue row is none of those things.
Does feedback survive a restart?
No. Both the ledger and its feedback queue are in-memory. A restart orphans active tasks (rewritten to BLOCKED in the file so it doesn't lie) and drops any feedback that wasn't drained first. Persistence is an honest gap, stated plainly.
Is this the same as the Autopoet?
No — and the contrast is clean. The autopoet grows the system, editing its declarative layer over time. Dispatch runs your errands, one bounded job at a time. They share the same supervision lesson — time-bounded, blocked-not-stuck, the reason always recorded — but their subjects are opposite.
keep GOING
Dispatch is a seam on top of constructs you've met — start with the parent, then the worker it runs.