learn / 08·8 — under workflows · dispatch

hand the goalOVERthe wall

Dispatch is the one verb a conversational agent needs for long work: it hands a goal across a seam and gets a task id back in milliseconds. A second model writes the plan, a supervisor runs it on a kill switch, and the answer returns as a note the agent drains on its next turn. No frozen turn, no forgotten promise — and a ledger you can read in git.

dispatch10 min read
A small figure at a control console hands a glowing capsule into a vast pneumatic tube that arcs up into a monumental sorting hall of brass canisters and ledger boards — the capsule sails away while the figure turns calmly back to a conversation — bright, sunlit, 1970s sci-fi style

the agent that CAN'T let go

Ask a conversational agent — voice, chat — for a twenty-minute research job, and you hit one of two failure modes. Either the turn freezes while it works, and you sit there listening to silence on a phone call. Or it cheerfully promises to "look into that" and then forgets, because the promise lived in a context window that ended. There is no hand-off seam, no ledger, no way for the work to reach back and say it's done.

And if the agent does spawn background work, that work is usually invisible. It isn't in git, it isn't reviewable, and when it dies it dies silently — no record that it ran, crashed, or timed out. The most plan-hungry collaborators we've ever had arrived, and we let them start errands nobody could see, audit, or hear back from.

This sub-lesson is the seam that fixes both halves. It sits under workflows — that parent lesson taught the grammar a plan is written in. This one teaches who calls the runner, how a goal becomes a plan without the agent writing it, and exactly how the answer finds its way back into the conversation.

the DEFINITION

dis·patch /dɪˈspætʃ/ noun · verb

1. handing a goal across a seam so the conversation never blocks: a second model authors a workflow per request, a supervisor runs it time-bounded on a task ledger, every terminal state queues as feedback, and the whole ledger renders to a git-visible file.

The word is borrowed from a mail room, on purpose. You don't carry the parcel to its destination — you hand it to a system that takes it from there, gives you a tracking number, and lets you walk away. Everything in this lesson is one of those four clauses made concrete, and every clause is proven by a single real run we'll follow the whole way down.

one goal, END to end

Here is the keystone — the full life of one dispatched goal, from a voice agent's request to a task id, with no human waiting on any of it. The voice agent says one thing — dispatch(goal) — and the call returns before the work has even started. Read the lanes left to right: the agent hands a goal to the /gk router; the router asks a cheap author model to turn that goal into an org plan; the plan is validated and persisted; a task lands on the ledger; the runner starts in the background; and the router returns a task id immediately:

sequenceDiagram
  participant A as conversational agent
  participant R as /gk router
  participant M as author model (Mercury)
  participant L as task ledger
  participant T as Workflow.Todo runner
  A->>R: dispatch(goal)
  R->>M: turn this goal into an org TODO outline
  M-->>R: a #+TITLE header, then TODO headlines
  Note over R: validate — at least one TODO headline
  R->>R: persist to workflows/the-slug.org
  R->>L: add task — gk-201 — queued then doing
  L->>T: spawn_monitor — run the plan in runs/gk-201/
  R-->>A: dispatched true, task gk-201, the workflow path
  Note over A,R: returns in milliseconds — the conversation continues
  Note over T: minutes later, the run finishes,
queued as feedback for the next turn

The single most important arrow is second from the bottom: the router answers the agent before the runner has done any real work. Dispatch is fire-and-record, not fire-and-wait. The run executes on its own time, and the agent finds out how it went later — which is the next three sections.

the agent doesn't WRITE the plan

Notice who wrote the org outline in that diagram: not the agent. This indirection is deliberate. The voice agent never authors a workflow — it hands a goal string and nothing more. A separate, cheap author model turns that goal into a plan. The spawn lane is the self-authoring demo: a goal in one model's voice becomes a reviewable plan in another's.

The author model is set by WB_GK_AUTHOR_MODEL, defaulting to inception/mercury-2 — called at temperature 0.2, because plan-writing wants to be boring and repeatable, not creative. It gets exactly one retry: if the output isn't a valid org plan, the validation error is fed back as a user message — your previous outline was invalid; emit a corrected org outline only — and it tries again. Markdown code fences the model might wrap around the org are stripped before validation even runs.

The validation gate is one line of intent: the output must contain at least one heading matching a TODO-family keyword followed by real text (TODO, NEXT, WAITING, DOING, STARTED, BLOCKED), or it's rejected as no_todo_headings. A plan with no tasks is not a plan.

The prompt the author receives is the workflow grammar verbatim — the same contract the parent lesson teaches. We won't re-teach it here; the whole point of dispatch is that it reuses that grammar rather than inventing a job format. Here is what Mercury actually wrote for our run — a real artifact, six leaves, every leaf ordered and gated:

#+TITLE: Research E2B and Daytona Pricing

* TODO Gather E2B Pricing Information
  :PROPERTIES:
  :ORDERED: t
  :END:
  - Use `curl` to download the official E2B pricing page …
  :done-when: test -s scratch/e2b_pricing.html

* TODO Extract Pricing Details from E2B Page
  :PROPERTIES:
  :ORDERED: t
  :END:
  - Parse `scratch/e2b_pricing.html` … write to `scratch/e2b_pricing.org`.
  :done-when: test -s scratch/e2b_pricing.org

* TODO Consolidate Findings into Report
  :PROPERTIES:
  :ORDERED: t
  :END:
  - Combine the per-source files into `scratch/REPORT.org` …
  :done-when: test -s scratch/REPORT.org

Every leaf carries :ORDERED: t (run in sequence) and a :done-when: shell gate that must exit zero before the task can be DONE. The final leaf consolidates everything into scratch/REPORT.org — the contract asks for three-to-seven leaves ending in a report. And because the plan is a file before it's a process, you can read it, review it, and keep it: dispatched workflows persist into a growing library under workflows/, each filed under a slug derived from the goal.

One detail worth flagging now, because the honesty section turns on it: Mercury wrote curl into that plan. The sandbox has no curl. The author is fallible — and that fallibility is exactly why writing the plan to a reviewable file is the right move. Hold that thought.

four states, no FIFTH

The ledger is a small in-memory process — a GenServer holding tasks, feedback, and monitors. Every task is a record with an id, the goal, the plan file, a status, timestamps, and a result. The ids look like gk-201. And the status moves through exactly four values, no more. Picture the graph: a task is born queued, becomes doing the instant the run is spawned, and from there it can only reach two terminals — done if the run finishes, or blocked if it crashes or runs out of time. There is no "stuck" and no "unknown":

stateDiagram-v2
  [*] --> queued: dispatch
  queued --> doing: spawn_monitor
  doing --> done: run finished, summarized
  doing --> blocked: crashed — abnormal DOWN
  doing --> blocked: timed out after 35m
  done --> [*]
  blocked --> [*]
  

The two blocked edges are the whole reason this is trustworthy. When the run is launched, the ledger does a spawn_monitor and wraps the work in a try/catch, so a clean finish sends itself back as a result. But it also arms a kill switch: a Process.send_after timer set to 35 minutes. If that timer fires and the task is still doing, the process is killed, the status becomes blocked, and the result reads timed out after 35m. And if the worker dies without reporting — an abnormal monitor signal while doing — the task becomes blocked with the crash reason captured. A crash or timeout becomes status blocked with the reason, never a stuck doing. That invariant is the difference between a ledger you can trust and a board full of zombies.

When a run does finish cleanly, the ledger summarizes it rather than storing the raw output. A finished workflow reports its tally — workflow finished: %{"DONE" => 4, "FAILED" => 2} — the frequency of each leaf's final state. Other results are inspected and sliced to 300 characters; errors are sliced the same way and mark the task blocked. The ledger keeps a verdict, not a transcript.

hearing back: the DRAIN

So the run finishes minutes after the agent moved on. How does the answer get back into a conversation that's already three turns ahead? Not with a callback that interrupts mid-sentence — with a queue the agent drains on its own schedule. When a task reaches a terminal state, a feedback note is prepended to a list: the task id, the final status, a short summary, and the original goal. It just sits there until the agent asks.

The agent asks by calling tasks — a natural thing to do at the top of a turn. That call returns both the current task list and a drain of the feedback queue, oldest-first, and the drain clears the queue. Read-once. Picture the timeline: the run finishes while you and the agent are talking about something else; a note slips onto the queue; on the agent's next turn it drains the note and relays it to you in plain language. The loop closes at conversation speed, not interrupt speed:

sequenceDiagram
  participant U as you
  participant A as the agent
  participant L as the ledger
  participant T as the run
  U->>A: ask about something else
  T->>L: gk-201 finished — note queued
  Note over L: feedback: [gk-201 done]
  U->>A: next turn — anything at all
  A->>L: tasks
  L-->>A: tasks + drain(feedback) — and the queue clears
  A->>U: by the way, that E2B research is done
  

That is the entire loop back into the conversation. No push, no ringing phone mid-thought — the feedback waits for a natural beat. The cost of that calm is the read-once rule: the second tasks call returns an empty feedback list, because the first one already took it. Drain it, relay it, or it's gone.

a ledger you can read in GIT

Depth rung. The ledger lives in memory — but it also renders itself to a file, TASKS.org, on every change. This is the same one-way discipline the workflows lesson drew: the file is generated from the ledger and never hand-edited. There is no two-way sync, because there is only one direction. Each ledger field maps to a piece of the rendering:

ledger fieldhow it renders in TASKS.org
statusa TODO keyword on the headline — queued→TODO, doing→DOING, done→DONE, blocked→BLOCKED
goalthe headline text after the keyword
id · file · started · finisheda :PROPERTIES: drawer under the headline
resulta plain line below the drawer — the summary or the failure reason
(order)entries sorted newest-first

The verdict of that table in one sentence: the rendered file carries every fact the ledger knows, in org's own grammar, so the whole task history is greppable, diffable, and reviewable in version control. The file even declares its own keyword set with a #+TODO: line, so any org reader colours the columns correctly. This is the same move BOARD.org makes — regenerated from the issue tracker, one-direction, never hand-edited. Generated files, owned by the generator.

the orphan REWRITE

Here is the honest edge that makes the whole thing trustworthy, and it's the best detail on the page. The ledger is in-memory. So when the engine restarts, every task that was doing is gone from the GenServer — but the last rendered TASKS.org on disk still says DOING. The file would be lying about what's alive: claiming a run is in progress when no process exists.

The fix is a render correction, not a sync. On boot, before serving anything, the renderer rewrites the file in place: any headline that's still in an active state — DOING or TODO — is rewritten to BLOCKED, and a comment is injected at the top to say why. It's not pretending the work resumed; it's refusing to let a generated file lie. We own that file, so we correct it. This is our real TASKS.org after the restart that orphaned our run — note the comment on line two, and the headline that now reads BLOCKED:

#+TITLE: groundskeeper — dispatched tasks
# orphaned actives marked BLOCKED on boot
#+TODO: TODO DOING BLOCKED | DONE

* BLOCKED Research E2B (e2b.dev) and Daytona (daytona.io) pricing: …
  :PROPERTIES:
  :ID: gk-201
  :FILE: workflows/research-e2b-e2b-dev-and-daytona-daytona-3.org
  :STARTED: [2026-06-11 04:58:28]
  :END:

This is the full lifecycle in one screenshot, including the unglamorous ending. gk-201 wasn't a failure — it was orphaned on restart, and the file says so plainly rather than leaving a stale DOING to mislead the next reader. The boundary is exact: the file never claims a liveness it can't back. That's the same lesson the autopoet taught about supervised, time-bounded work — blocked-not-stuck, the reason always recorded.

what a run LEAVES behind

Depth rung. Underneath the ledger, the actual execution is the workflow runner from the parent lesson — native org as the state machine, ordered leaves as a pipeline, parallel leaves via bounded concurrency, :done-when: as the acceptance gate. Each dispatched run gets its own working directory, runs/<id>/, and leaves a durable trail in it:

  • scratch/ — a persistent shared workspace the leaves read and write across the run. For our run it holds the real extracted pricing — scratch/e2b_pricing.org has per-second numbers, down to fractions of a cent per vCPU-second.
  • _steps.jsonl — every tool call the leaf agent made, hash-chained and signed when the run is sealed, so the trail can't be quietly edited after the fact.
  • _telemetry.db — the run's telemetry, written alongside.

One rule governs the acceptance checks, and it's load-bearing: the :done-when: commands run in the in-sandbox WASM shell, never on the host. An author-supplied check is untrusted input, so it must not become a native-execution vector. The consequence is that the checks fail closed — if a check command isn't one the WASM shell knows, the task simply doesn't reach DONE. A plan can't talk its way to success by naming a command the sandbox refuses to run.

where it BITES

Honesty section. The most vivid limit is the one we flagged earlier. Mercury wrote a plan full of curl and mkdir; the WASM shell has neither. Here are two real lines from our run's _steps.jsonl — the sandbox biting the author:

{"error":"{:unknown_command, \"mkdir\"}","tool":"shell","step":1,
 "args":{"pipeline":"mkdir -p scratch && curl -sL -o scratch/daytona_pricing.html …"}}
{"error":"{:unknown_command, \"curl\"}","tool":"shell","step":0,
 "args":{"pipeline":"curl -sL -o scratch/e2b_pricing.html https://e2b.dev/pricing"}}

The plan named the wrong toolchain. But watch what happened next: the leaf agent adapted — it reached for its fetch tool instead, pulled the pages, and still produced scratch/e2b_pricing.org with real per-second pricing. The run partially succeeded despite the wrong instructions, and the gates that couldn't pass simply left their leaves un-DONE. The plan is reviewable precisely because the author is fallible, and the :done-when: gates fail closed instead of pretending.

The rest of the bites are quieter, and all real:

  • The ledger is in-memory. A restart orphans active tasks (the BLOCKED rewrite mitigates the lie, but the run itself is gone) — and it also drops any feedback that hadn't been drained yet. Undrained feedback dies with the BEAM. The rewrite saves the file's honesty, not the work.
  • Feedback is read-once and global. One queue, drained whole, with no per-conversation routing. If two conversations share a ledger, whoever drains first gets the notes.
  • Summaries truncate to 300 characters. The ledger keeps a verdict, not a transcript — for the full story you read the run's scratch/ and _steps.jsonl.
  • 35 minutes is a wall, not a suggestion. A run that needs longer is killed and marked blocked. Dispatch is for errands, not marathons.

questions people actually ASK

Why doesn't the agent author its own workflows?

Deliberate indirection. The conversational agent hands a goal; a separate author model turns it into a reviewable org plan. That keeps the conversational model focused on talking, makes the plan a file you can inspect before it runs, and is the self-authoring demo — a goal in one voice becoming a plan in another.

What happens at minute thirty-six?

The kill switch. A 35-minute timer is armed when the run starts; if it fires while the task is still doing, the process is killed, the status becomes blocked, and the result records "timed out after 35m". There's no grace period — the bound is a wall by design.

Can I hand-edit TASKS.org?

You can, but the next ledger change clobbers it — the file is rendered from the in-memory ledger on every change, one direction only. It's a generated view, like a board rendered from a plan. Edit the work, not the rendering.

Why org instead of a job queue?

Because the plan reuses the workflow grammar you already have — ordered leaves, dependencies, done-when gates — so a dispatched job is a reviewable, diffable file before it's a process. A job-queue row is none of those things.

Does feedback survive a restart?

No. Both the ledger and its feedback queue are in-memory. A restart orphans active tasks (rewritten to BLOCKED in the file so it doesn't lie) and drops any feedback that wasn't drained first. Persistence is an honest gap, stated plainly.

Is this the same as the Autopoet?

No — and the contrast is clean. The autopoet grows the system, editing its declarative layer over time. Dispatch runs your errands, one bounded job at a time. They share the same supervision lesson — time-bounded, blocked-not-stuck, the reason always recorded — but their subjects are opposite.

keep GOING

Dispatch is a seam on top of constructs you've met — start with the parent, then the worker it runs.