to-dos — a headline whose first word is a state

state lives in SILOS

Every agent system reinvents task state, and every reinvention buries it somewhere nobody can read. A JSON queue here, a status column there, a mark_done() API over there — three shapes for the same idea, and the state always ends up in a database the human can't diff and the agent can't be audited on. You ask "what happened on this task last night?" and the honest answer is: query the silo, trust the silo, hope the silo is right.

And "done" in those systems is the thinnest fact of all — a boolean somebody set. Not a claim that was checked, not a state that survives a restart, not a line in a history you can replay. Just a flag flipped by whoever held the write token, recoverable only by trusting that they were honest and awake.

Agents need state both sides can read. The model needs to see where the work stands so it can pick up the next thing; the human needs to see what the model did so they can trust it. A column in a vendor's cloud satisfies neither — it's legible to the API and to no one else. The format that's legible to both already exists, and it's older than any of this.

the DEFINITION

to·do /ˈtuː·duː/ noun

1. a headline whose first word is a state — one keyword in front of a title, read by the engine as the task's position in its own state machine. The outline is the machine; the keyword is the interpreter's only input.

That's the entire mechanism. Org already had the grammar — a keyword standing in front of a headline — and Workbooks runs it literally. The engine's whole state model is six words before a title; there is no board schema, no status table, no second copy. The org lesson named these states in passing as "native, not convention." This page is the full grammar behind that chip — what the words are, who is allowed to move one, what you can trust DONE to mean, and where to look the morning after a run.

One module is the engine of record: Workbooks.Workflow.Todo. Its own moduledoc puts it in five words — the outline IS the state machine. TODO keywords are task states. Heading nesting is the difference between a sub-workflow and a unit of work. A property on a parent decides whether its children run in order or all at once. Nothing here was bolted on; it was all already in the file.

the state WORDS

The default keyword set is six active words and three done words — real constants in the engine, not a suggestion:

@done_states     ~w(DONE CANCELLED CANCELED)
@default_keywords ~w(TODO NEXT WAITING DOING STARTED BLOCKED) ++ @done_states

A keyword is detected by the simplest rule there is: take the first word of the headline after the stars. If it's in the set, that's the state; if it isn't, the state is nil and the word stays in the title. No tags, no drawer, no annotation — position alone.

You can replace the whole set with one line. A #+TODO: line in the file swaps the keywords out — the engine parses it by replacing the | divider with a space and splitting on whitespace. The editorial board for bit-ml, for instance, declares its own pipeline stages: #+TODO: ASSIGNED RESEARCH WRITING EDIT | PUBLISHED KILLED. Agents claim work by moving through those stages.

Here is the catch, and it's the honest kind. The | divider is cosmetic to this engine. The done-side is hard-coded to DONE / CANCELLED / CANCELED. A custom done keyword like PUBLISHED is a state the engine will recognize and let you select on — but it is not a done-state, so the engine won't skip it on resume the way it skips a real DONE. Invent states freely for selection; just know that "finished, stop touching it" is spelled exactly three ways.

keyword	means	side	who sets it
TODO	queued, not started	active	author / dream `put down`
NEXT	the pick-up signal	active	author / dream `pick up`
WAITING	blocked on something external	active	author / agent
DOING / STARTED	in flight	active	agent (claims it)
BLOCKED	held — crash or unmet edge	active	agent / orphan-correction
DONE	finished, validated	done	the run, when the gate passes
CANCELLED / CANCELED	finished, abandoned	done	author / dream `cancel`

who moves the WORD

Start with what the engine does not do: it never rewrites your file's keywords. There is no write-back code in the run path. A run reads your outline, executes the leaves, and lands its results in telemetry — your headlines stay exactly as you wrote them. Moving a keyword is the author's edit, or an agent's, made in a commit. The transition is a git diff, every time.

Some movers are mechanical, though, and they're worth naming because they show the discipline. The dreaming process emits a * verdicts block, and the engine applies it by exact heading match: pick up: <task> writes NEXT, put down: <task> writes TODO, cancel: <task> writes CANCELLED. Even the machine moves state through the proper keywords — never by hand-picking out of band.

flowchart LR
  todo["TODO"]
  next["NEXT"]
  doing["DOING"]
  done(["DONE"])
  blocked["BLOCKED"]
  cancelled(["CANCELLED"])
  todo -- "dream: pick up" --> next
  next -- "agent claims :AGENT:" --> doing
  doing -- "run: gate passes" --> done
  doing -- "crash / timeout" --> blocked
  blocked -- "agent retries" --> doing
  todo -- "dream: cancel" --> cancelled
  next -- "dream: cancel" --> cancelled
  style done fill:#13d943,stroke:#121316,stroke-width:2.5px
  style cancelled fill:#d9dbd3,stroke:#121316
  style todo fill:#f2ddb0,stroke:#121316
  style next fill:#f2ddb0,stroke:#121316

Read that graph as a life story. A task starts at TODO, parked. A dream verdict — or you — promotes it to NEXT, the pick-up signal. An agent claims it by writing :AGENT: you on the node and committing; to every peer, a claimed task is now invisible. The claim turns it DOING. A run that passes the gate lands it on DONE in green; a crash or timeout drops it to BLOCKED, never a stuck DOING. And a dream can route either active state to CANCELLED, the grey terminus. Every edge is labeled by an actor, because state never moves on its own.

Generated mirrors are the one place keywords appear without a human moving them — and they're one-way renders, not syncs. The groundskeeper's TASKS.org and BOARD.org are drawn from a ledger, never hand-edited; a status map turns in_progress→DOING, closed→DONE and stamps inactive timestamps. When the runtime reboots, orphan-correction rewrites any stale * DOING or * TODO to * BLOCKED — a crash becomes a held task, never a lie about active work.

what a run does with STATES

When the engine runs the outline, it walks each heading and decides one of a few things. The decision tree is small enough to hold in your head:

flowchart TD
  start["a heading"]
  done?{"already a
done-state?"}
  leaf?{"has children?"}
  skip["skip — record DONE
output: (already DONE)"]
  gate{"validation
passes?"}
  ld["leaf → DONE"]
  lf["leaf → FAILED"]
  allkids{"all children
ended DONE?"}
  cd["composite → DONE"]
  cp["composite → PARTIAL"]
  start --> done?
  done? -- yes --> skip
  done? -- no --> leaf?
  leaf? -- "no (leaf)" --> gate
  gate -- yes --> ld
  gate -- no --> lf
  leaf? -- "yes (composite)" --> allkids
  allkids -- yes --> cd
  allkids -- no --> cp
  style ld fill:#13d943,stroke:#121316,stroke-width:2px
  style cd fill:#13d943,stroke:#121316,stroke-width:2px
  style lf fill:#f3c5a3,stroke:#121316
  style cp fill:#f3c5a3,stroke:#121316
  style skip fill:#d9dbd3,stroke:#121316

Trace it. First question: is this heading already in a done-state? If yes, it's skipped and recorded as DONE with the output (already DONE) — and that single fact is what makes a run resumable: re-running the same outline picks up exactly where it left off, because finished work is simply passed over. (A file-level CANCELLED is normalized to DONE in the record — abandoned still counts as settled.) If it's not done, the next question is whether it has children. A leaf — no children — runs its work and reaches DONE if its validation passes, else FAILED. A composite — a heading with children — reaches DONE only if all its children ended DONE, else PARTIAL. So the run-result vocabulary is exactly three words: DONE | FAILED | PARTIAL.

Two properties shape how children run. :ORDERED: t on a parent makes its children sequential — a pipeline. Its absence makes them parallel, run with Task.async_stream at eight wide with a thirty-minute ceiling per task. And :BLOCKER: is an explicit wait edge — this task holds until that one is done. Note one thing the dossier is blunt about: every heading is a node. A heading with no keyword and no children still runs as a leaf, because nil isn't a done-state. There's no :TASK:-tag filter — if you POST a raw board with a * log heading, the log heading runs too.

what DONE actually RECORDS

Depth rung — skippable, but it's where DONE stops being a feeling and becomes a row. The engine never edits your file; what it produces is a record per task, and the run result is just those records sorted by index. Each one has exactly these fields:

field	what it holds
`id`	the title slugged — lowercased, non-alphanumerics → `-`, capped at 48 chars
`idx`	position in the run — the sort key for the result
`title`	the headline text, keyword stripped
`state`	one of `DONE` / `FAILED` / `PARTIAL`
`output`	the agent's result, truncated to 600 chars
`ts`	unix seconds when it settled

The quiet detail is that id keys off the title, not the position. State follows the words, not the line number — reorder your headlines and the same task keeps its identity. That's also why a run is resumable across edits: the engine recognizes a task by who it is, not where it sits.

DONE is a claim until CHECKED

A leaf reaches DONE only if its validation passes — this is the DONE gate, and it's the difference between a boolean somebody set and a fact something proved. The gate is a :DONE-WHEN: property (or a sh :check source block in the body). The check runs in the WASM shell, over the command registry — never native OS bash — and it's fail-closed: the engine only believes a pass if it sees the sentinel __WB_CHECK_OK__ in the output. No sentinel, no DONE.

If there's no check at all, the agent's own completion stands — which means a no-check DONE is the agent grading its own homework. That's an honest limit, not a bug, and it has its own full treatment: the validations deep dive walks the whole DONE-WHEN ladder, from no-gate to test-gated. The one-line version: DONE is only as trustworthy as the check behind it.

how a change becomes HISTORY

A state change doesn't vanish into a database — it falls onto three rails at once, each readable by a different audience. Here is the path a single keyword-move takes, from an agent's edit to a signed ledger:

sequenceDiagram
  participant A as agent / author
  participant G as git
  participant T as _telemetry.db
  participant L as _ledger.json
  A->>G: move keyword + one dated line under * log → commit
  Note over G: the git log IS the board's history
  A->>T: run ends → persist task_events + step_events
  Note over T: every task record, every tool call
  A->>L: seal — hash-chain _steps.jsonl, sign head
  Note over L: h_i = sha256(h_i-1 ++ raw_line_i)
  L->>G: anchor — commit the ledger into the tenant repo

Walk the three rails as the diagram tells them. First, git: the protocol is to record by moving state, never deleting, in the same commit, plus one dated line under a * log heading — Claude Code reads the board at session start and the git log is its memory. Second, telemetry: every run persists to _telemetry.db, a task_events table holding each task record and a step_events table ingested from an always-on _steps.jsonl — every tool call the agent made, logged at the chokepoint. Third, the sealed ledger: Workbooks.Ledger.seal runs at the end of every Todo.run, hash-chaining the raw lines of _steps.jsonl — h_i = sha256(h_{i-1} ++ raw_line_i), genesis constant workbooks-ledger-v1 — and signing the head with the tenant's Ed25519 did:key. verify returns two properties, tamper-evident and attributable; anchor commits the ledger back into the tenant repo. The full seal/verify/anchor story is its own page — the ledger. This page only shows the state-to-history hop.

reading the RECORD

Depth rung. The morning after a run, three verbs answer three questions — what ran, what broke, and whether anyone edited the story. Real output shapes, from the CLI:

$ wb telemetry
SLUG          STAGE  CALLS  ERRORS  MS
wf-1842       done   41     1       183204
wf-1791       done   12     0       60113

$ wb telemetry wf-1842
stage=done calls=41 errors=1 total_ms=183204
  ! step 17 bash: exit 1

$ wb ledger wf-1842
tamper-evident=ok attributable=ok count=41 did=did:key:z6Mk…

The first command is what ran — a runs index, one line per run, with call counts, error counts, and total milliseconds. The second is what broke — stage, calls, errors, and a line per failed step. The third is prove nobody edited the story — it re-walks the hash chain over _steps.jsonl and confirms the head still matches its did:key signature. If a single byte of history were altered, tamper-evident would not say ok.

Over HTTP it's the same record, live. POST /api/workflow/todo with {"org": "..."} returns 202 and a slug (wf-<n>); the run lands in /tmp/bb/<slug>, and you poll _status.json for per-task stages — running, done, or error.

where it BITES

Honesty section. This grammar is sharp in a few places, and pretending otherwise would betray the whole point of a readable plan.

The divider is cosmetic. Said once already, worth saying twice: the | in a #+TODO: line is decoration to the run engine. The done-set is hard-coded to DONE/CANCELLED/CANCELED. Custom done keywords are selectable states, not skip-on-resume states.

Two parsers disagree. The Elixir workflow engine has its own self-contained parser with the full default keyword set and the #+TODO: override. The kernel — wb query <file.org> — uses orgize with its default config, whose todo keywords are exactly TODO and DONE. So wb query reports state only for literal TODO/DONE; a NEXT or DOING keyword stays inside the title as far as the kernel is concerned. The two surfaces read the same file and see different states — know which one you're asking.

Every heading is a node. There is no task-tag filter. POST a raw board and your * log heading runs as a leaf like everything else. Boards meant to be rendered are not the same shape as outlines meant to be run.

A no-check DONE is self-grading. Without a :done-when:, the agent's word is the only evidence. The autopoet hit this honesty-gate directly — a self-report of completion is a hypothesis, not a result, until something independent confirms it. The fix is a gate, not a policy.

Orphans become BLOCKED. A crash or a timeout mid-run leaves a stale active state. On boot, the groundskeeper rewrites stale DOING/ TODO headlines to BLOCKED — by design, a crash becomes a held task you can see, never a stuck DOING that lies about active work.

questions people actually ASK

Can I invent my own states?

Yes, for selection — a #+TODO: line replaces the whole keyword set, and agents can claim work by moving through your custom stages. But done-detection stays DONE/CANCELLED/CANCELED. A custom done word like PUBLISHED is a state the engine recognizes, not a state it skips on resume.

Does the engine update my file?

No. There is no write-back code in the run path. Results land in telemetry and the ledger; your headlines stay exactly as you wrote them. Moving a keyword is your edit, or an agent's, in a commit — which is precisely why the git diff is the audit trail.

What's the difference from bd / beads?

Two ledgers, never crossed. bd is the issue tracker — a separate system of record with its own database. The to-do grammar is state living in the plan file itself, with git as its history. Rendered mirrors like BOARD.org can draw from bd, but that's a one-way render, not a sync. They stay distinct on purpose.

Is DONE trustworthy?

Only as strong as its gate. A DONE with a passing :done-when: check — run fail-closed in the WASM shell — is a proven fact. A DONE with no check is the agent grading its own homework. The state word is the same; the evidence behind it is not. See validations.

Where do I look the morning after a run?

Three verbs. wb telemetry for what ran, wb telemetry <slug> for what broke, wb ledger <slug> to prove nobody edited the history. All three read files the run left in its workdir — no server required to ask the question.

Why does wb query miss my NEXT tasks?

Because it's the kernel parser, not the workflow engine. orgize's default config only knows TODO and DONE, so a NEXT or DOING keyword stays inside the title there. The Elixir runner knows the full set. Same file, two vocabularies — a real drift, named honestly.

keep GOING

This page is the grammar behind one chip on the org lesson — here's where it connects.

Org, the grammarthe parent — states named as native, not convention

→ ✓

Validationsthe DONE-WHEN ladder, in full

→

Boardsno board model — rendered over the plan file

→ ⛓

The ledgerseal, verify, anchor — the full history rail

→