state lives in SILOS
Every agent system reinvents task state, and every reinvention buries it
somewhere nobody can read. A JSON queue here, a status column there, a
mark_done() API over there — three shapes for the same idea, and
the state always ends up in a database the human can't diff and the agent
can't be audited on. You ask "what happened on this task last night?" and the
honest answer is: query the silo, trust the silo, hope the silo is right.
And "done" in those systems is the thinnest fact of all — a boolean somebody set. Not a claim that was checked, not a state that survives a restart, not a line in a history you can replay. Just a flag flipped by whoever held the write token, recoverable only by trusting that they were honest and awake.
Agents need state both sides can read. The model needs to see where the work stands so it can pick up the next thing; the human needs to see what the model did so they can trust it. A column in a vendor's cloud satisfies neither — it's legible to the API and to no one else. The format that's legible to both already exists, and it's older than any of this.
the DEFINITION
1. a headline whose first word is a state — one keyword in front of a title, read by the engine as the task's position in its own state machine. The outline is the machine; the keyword is the interpreter's only input.
That's the entire mechanism. Org already had the grammar — a keyword standing in front of a headline — and Workbooks runs it literally. The engine's whole state model is six words before a title; there is no board schema, no status table, no second copy. The org lesson named these states in passing as "native, not convention." This page is the full grammar behind that chip — what the words are, who is allowed to move one, what you can trust DONE to mean, and where to look the morning after a run.
One module is the engine of record: Workbooks.Workflow.Todo.
Its own moduledoc puts it in five words — the outline IS the state
machine. TODO keywords are task states. Heading nesting is the difference
between a sub-workflow and a unit of work. A property on a parent decides
whether its children run in order or all at once. Nothing here was bolted on;
it was all already in the file.
the state WORDS
The default keyword set is six active words and three done words — real constants in the engine, not a suggestion:
@done_states ~w(DONE CANCELLED CANCELED) @default_keywords ~w(TODO NEXT WAITING DOING STARTED BLOCKED) ++ @done_states
A keyword is detected by the simplest rule there is: take the first word of
the headline after the stars. If it's in the set, that's the state; if it
isn't, the state is nil and the word stays in the title. No tags,
no drawer, no annotation — position alone.
You can replace the whole set with one line. A #+TODO: line in
the file swaps the keywords out — the engine parses it by replacing the
| divider with a space and splitting on whitespace. The editorial
board for bit-ml, for instance, declares its own pipeline
stages: #+TODO: ASSIGNED RESEARCH WRITING EDIT | PUBLISHED KILLED.
Agents claim work by moving through those stages.
Here is the catch, and it's the honest kind. The | divider is
cosmetic to this engine. The done-side is hard-coded to
DONE / CANCELLED / CANCELED. A custom
done keyword like PUBLISHED is a state the engine will recognize
and let you select on — but it is not a done-state, so the engine
won't skip it on resume the way it skips a real DONE. Invent states freely for
selection; just know that "finished, stop touching it" is spelled exactly
three ways.
| keyword | means | side | who sets it |
|---|---|---|---|
| TODO | queued, not started | active | author / dream put down |
| NEXT | the pick-up signal | active | author / dream pick up |
| WAITING | blocked on something external | active | author / agent |
| DOING / STARTED | in flight | active | agent (claims it) |
| BLOCKED | held — crash or unmet edge | active | agent / orphan-correction |
| DONE | finished, validated | done | the run, when the gate passes |
| CANCELLED / CANCELED | finished, abandoned | done | author / dream cancel |
who moves the WORD
Start with what the engine does not do: it never rewrites your
file's keywords. There is no write-back code in the run path. A run reads your
outline, executes the leaves, and lands its results in telemetry — your
headlines stay exactly as you wrote them. Moving a keyword is the author's
edit, or an agent's, made in a commit. The transition is a git
diff, every time.
Some movers are mechanical, though, and they're worth naming because they
show the discipline. The dreaming process emits a
* verdicts block, and the engine applies it by exact heading
match: pick up: <task> writes NEXT,
put down: <task> writes TODO,
cancel: <task> writes CANCELLED. Even the
machine moves state through the proper keywords — never by hand-picking out of
band.
flowchart LR todo["TODO"] next["NEXT"] doing["DOING"] done(["DONE"]) blocked["BLOCKED"] cancelled(["CANCELLED"]) todo -- "dream: pick up" --> next next -- "agent claims :AGENT:" --> doing doing -- "run: gate passes" --> done doing -- "crash / timeout" --> blocked blocked -- "agent retries" --> doing todo -- "dream: cancel" --> cancelled next -- "dream: cancel" --> cancelled style done fill:#13d943,stroke:#121316,stroke-width:2.5px style cancelled fill:#d9dbd3,stroke:#121316 style todo fill:#f2ddb0,stroke:#121316 style next fill:#f2ddb0,stroke:#121316
Read that graph as a life story. A task starts at TODO, parked. A
dream verdict — or you — promotes it to NEXT, the pick-up signal. An
agent claims it by writing :AGENT: you on the node and committing;
to every peer, a claimed task is now invisible. The claim turns it
DOING. A run that passes the gate lands it on DONE in green; a
crash or timeout drops it to BLOCKED, never a stuck DOING. And a
dream can route either active state to CANCELLED, the grey terminus.
Every edge is labeled by an actor, because state never moves on its own.
Generated mirrors are the one place keywords appear without a human moving
them — and they're one-way renders, not syncs. The groundskeeper's
TASKS.org and BOARD.org are drawn from a
ledger, never hand-edited; a status map turns
in_progress→DOING, closed→DONE and stamps inactive timestamps.
When the runtime reboots, orphan-correction rewrites any stale
* DOING or * TODO to * BLOCKED — a crash
becomes a held task, never a lie about active work.
what a run does with STATES
When the engine runs the outline, it walks each heading and decides one of a few things. The decision tree is small enough to hold in your head:
flowchart TD
start["a heading"]
done?{"already a
done-state?"}
leaf?{"has children?"}
skip["skip — record DONE
output: (already DONE)"]
gate{"validation
passes?"}
ld["leaf → DONE"]
lf["leaf → FAILED"]
allkids{"all children
ended DONE?"}
cd["composite → DONE"]
cp["composite → PARTIAL"]
start --> done?
done? -- yes --> skip
done? -- no --> leaf?
leaf? -- "no (leaf)" --> gate
gate -- yes --> ld
gate -- no --> lf
leaf? -- "yes (composite)" --> allkids
allkids -- yes --> cd
allkids -- no --> cp
style ld fill:#13d943,stroke:#121316,stroke-width:2px
style cd fill:#13d943,stroke:#121316,stroke-width:2px
style lf fill:#f3c5a3,stroke:#121316
style cp fill:#f3c5a3,stroke:#121316
style skip fill:#d9dbd3,stroke:#121316
Trace it. First question: is this heading already in a done-state? If yes,
it's skipped and recorded as DONE with the output (already
DONE) — and that single fact is what makes a run resumable:
re-running the same outline picks up exactly where it left off, because
finished work is simply passed over. (A file-level CANCELLED is
normalized to DONE in the record — abandoned still counts as
settled.) If it's not done, the next question is whether it has children. A
leaf — no children — runs its work and reaches DONE if its
validation passes, else FAILED. A composite — a heading with
children — reaches DONE only if all its children ended DONE,
else PARTIAL. So the run-result vocabulary is exactly three words:
DONE | FAILED | PARTIAL.
Two properties shape how children run. :ORDERED: t on a parent
makes its children sequential — a pipeline. Its absence makes them parallel,
run with Task.async_stream at eight wide with a
thirty-minute ceiling per task. And :BLOCKER: is an
explicit wait edge — this task holds until that one is done. Note one thing
the dossier is blunt about: every heading is a node. A heading with no
keyword and no children still runs as a leaf, because nil isn't a
done-state. There's no :TASK:-tag filter — if you POST a raw board
with a * log heading, the log heading runs too.
what DONE actually RECORDS
Depth rung — skippable, but it's where DONE stops being a feeling and becomes a row. The engine never edits your file; what it produces is a record per task, and the run result is just those records sorted by index. Each one has exactly these fields:
| field | what it holds |
|---|---|
id | the title slugged — lowercased, non-alphanumerics → -, capped at 48 chars |
idx | position in the run — the sort key for the result |
title | the headline text, keyword stripped |
state | one of DONE / FAILED / PARTIAL |
output | the agent's result, truncated to 600 chars |
ts | unix seconds when it settled |
The quiet detail is that id keys off the title, not
the position. State follows the words, not the line number — reorder your
headlines and the same task keeps its identity. That's also why a run is
resumable across edits: the engine recognizes a task by who it is, not where
it sits.
DONE is a claim until CHECKED
A leaf reaches DONE only if its validation passes — this is the DONE gate,
and it's the difference between a boolean somebody set and a fact something
proved. The gate is a :DONE-WHEN: property (or a sh
:check source block in the body). The check runs in the
WASM shell, over the command registry — never native OS bash — and it's
fail-closed: the engine only believes a pass if it sees the sentinel
__WB_CHECK_OK__ in the output. No sentinel, no DONE.
If there's no check at all, the agent's own completion stands — which means a no-check DONE is the agent grading its own homework. That's an honest limit, not a bug, and it has its own full treatment: the validations deep dive walks the whole DONE-WHEN ladder, from no-gate to test-gated. The one-line version: DONE is only as trustworthy as the check behind it.
how a change becomes HISTORY
A state change doesn't vanish into a database — it falls onto three rails at once, each readable by a different audience. Here is the path a single keyword-move takes, from an agent's edit to a signed ledger:
sequenceDiagram participant A as agent / author participant G as git participant T as _telemetry.db participant L as _ledger.json A->>G: move keyword + one dated line under * log → commit Note over G: the git log IS the board's history A->>T: run ends → persist task_events + step_events Note over T: every task record, every tool call A->>L: seal — hash-chain _steps.jsonl, sign head Note over L: h_i = sha256(h_i-1 ++ raw_line_i) L->>G: anchor — commit the ledger into the tenant repo
Walk the three rails as the diagram tells them. First, git: the
protocol is to record by moving state, never deleting, in the same
commit, plus one dated line under a * log heading — Claude Code
reads the board at session start and the git log is its memory. Second,
telemetry: every run persists to _telemetry.db, a
task_events table holding each task record and a
step_events table ingested from an always-on
_steps.jsonl — every tool call the agent made, logged at the
chokepoint. Third, the sealed ledger: Workbooks.Ledger.seal
runs at the end of every Todo.run, hash-chaining the raw lines of
_steps.jsonl — h_i = sha256(h_{i-1} ++ raw_line_i),
genesis constant workbooks-ledger-v1 — and signing the head with
the tenant's Ed25519 did:key. verify returns two properties,
tamper-evident and attributable; anchor commits
the ledger back into the tenant repo. The full seal/verify/anchor story is its
own page — the ledger. This page only shows the
state-to-history hop.
reading the RECORD
Depth rung. The morning after a run, three verbs answer three questions — what ran, what broke, and whether anyone edited the story. Real output shapes, from the CLI:
$ wb telemetry SLUG STAGE CALLS ERRORS MS wf-1842 done 41 1 183204 wf-1791 done 12 0 60113 $ wb telemetry wf-1842 stage=done calls=41 errors=1 total_ms=183204 ! step 17 bash: exit 1 $ wb ledger wf-1842 tamper-evident=ok attributable=ok count=41 did=did:key:z6Mk…
The first command is what ran — a runs index, one line per run, with
call counts, error counts, and total milliseconds. The second is what
broke — stage, calls, errors, and a line per failed step. The third is
prove nobody edited the story — it re-walks the hash chain over
_steps.jsonl and confirms the head still matches its did:key
signature. If a single byte of history were altered, tamper-evident
would not say ok.
Over HTTP it's the same record, live. POST /api/workflow/todo
with {"org": "..."} returns 202 and a slug
(wf-<n>); the run lands in /tmp/bb/<slug>,
and you poll _status.json for per-task stages —
running, done, or error.
where it BITES
Honesty section. This grammar is sharp in a few places, and pretending otherwise would betray the whole point of a readable plan.
The divider is cosmetic. Said once already, worth saying twice: the
| in a #+TODO: line is decoration to the run engine.
The done-set is hard-coded to DONE/CANCELLED/CANCELED. Custom done keywords are
selectable states, not skip-on-resume states.
Two parsers disagree. The Elixir workflow engine has its own
self-contained parser with the full default keyword set and the
#+TODO: override. The kernel — wb query <file.org>
— uses orgize with its default config, whose todo keywords are exactly
TODO and DONE. So wb query reports
state only for literal TODO/DONE; a NEXT or
DOING keyword stays inside the title as far as the kernel is
concerned. The two surfaces read the same file and see different states —
know which one you're asking.
Every heading is a node. There is no task-tag filter. POST a raw
board and your * log heading runs as a leaf like everything else.
Boards meant to be rendered are not the same shape as outlines meant
to be run.
A no-check DONE is self-grading. Without a :done-when:,
the agent's word is the only evidence. The
autopoet hit this honesty-gate directly — a self-report
of completion is a hypothesis, not a result, until something independent
confirms it. The fix is a gate, not a policy.
Orphans become BLOCKED. A crash or a timeout mid-run leaves a stale
active state. On boot, the groundskeeper rewrites stale DOING/
TODO headlines to BLOCKED — by design, a crash
becomes a held task you can see, never a stuck DOING that lies
about active work.
questions people actually ASK
Can I invent my own states?
Yes, for selection — a #+TODO: line replaces the whole
keyword set, and agents can claim work by moving through your custom stages.
But done-detection stays DONE/CANCELLED/CANCELED. A custom done word
like PUBLISHED is a state the engine recognizes, not a state it skips on
resume.
Does the engine update my file?
No. There is no write-back code in the run path. Results land in telemetry and the ledger; your headlines stay exactly as you wrote them. Moving a keyword is your edit, or an agent's, in a commit — which is precisely why the git diff is the audit trail.
What's the difference from bd / beads?
Two ledgers, never crossed. bd is the issue tracker — a separate system of
record with its own database. The to-do grammar is state living in the plan
file itself, with git as its history. Rendered mirrors like
BOARD.org can draw from bd, but that's a one-way render,
not a sync. They stay distinct on purpose.
Is DONE trustworthy?
Only as strong as its gate. A DONE with a passing :done-when:
check — run fail-closed in the WASM shell — is a proven fact. A DONE with no
check is the agent grading its own homework. The state word is the same; the
evidence behind it is not. See validations.
Where do I look the morning after a run?
Three verbs. wb telemetry for what ran,
wb telemetry <slug> for what broke,
wb ledger <slug> to prove nobody edited the history.
All three read files the run left in its workdir — no server required to ask
the question.
Why does wb query miss my NEXT tasks?
Because it's the kernel parser, not the workflow engine. orgize's default config only knows TODO and DONE, so a NEXT or DOING keyword stays inside the title there. The Elixir runner knows the full set. Same file, two vocabularies — a real drift, named honestly.
keep GOING
This page is the grammar behind one chip on the org lesson — here's where it connects.