learn / 05·2 — under agents · authoring

a workerWRITTENin plain org

An agent definition is one org file — a :agent:-tagged headline, four drawer properties, and a ** System prompt heading. No schema, no class, no build step. The whole parser is 73 lines, and you can hold all of it in your head. This page is what the runtime actually reads — and the three mistakes that make an agent run silently broken.

authoring11 min read
A small figure at a monumental drafting table inscribes a single luminous page; the page rises as a towering robot worker stepping off the desk into a bright atrium — 1970s sci-fi style, warm and optimistic

your agent is a config file in someone's FRAMEWORK

Every agent framework makes defining an agent a programming task. A Python class you subclass. A YAML schema with a validator. A builder API with a dozen chained calls. A vendor config format only that one product reads. The definition lives in code you have to deploy, in a shape nothing else can parse, and the most failure-prone part of the whole thing — the system prompt — is usually a string literal three imports deep, invisible in any diff that matters.

None of that is inspectable, and none of it is portable. You can't read a competitor's agent, you can't hand yours to a teammate as a file, and you certainly can't ask the same tools you use for everything else to make sense of it. The definition is trapped inside the framework that runs it.

Here it isn't. An agent definition is a plain org file — the same grammar as everything else in the ecosystem. The runtime that turns it into a worker is small enough to read in one sitting. This lesson is that file, and that runtime, with nothing hidden.

the DEFINITION

a·gent def·i·ni·tion /ˈeɪ·dʒənt ˌdef·ə·ˈnɪ·ʃən/ noun

1. one org file describing a runnable agent: the first headline tagged :agent:, a :PROPERTIES: drawer carrying :ID: :MODEL: :TOOLKITS: :TAGLINE:, and a ** System prompt heading whose body becomes the prompt. Parsed by the same kernel that parses workbooks — no bespoke format, no registration, no build step.

That's the whole contract. There is no second parser, no schema file, no decorator. The agent is discovered like a toolkit — by a tag — and run on the same clean-room substrate as everything else. Here is a complete one that really runs, top to bottom:

#+TITLE: analyst agent

* Data Analyst                                                     :agent:
  :PROPERTIES:
  :ID:        analyst
  :MODEL:     xiaomi/mimo-v2.5
  :TAGLINE:   Processes JSON data via in-WASM commands and persists findings.
  :TOOLKITS:  shell wb sandbox
  :END:

** System prompt

You are *analyst*, a clean-room runtime agent. You have tools: shell
(run jq/grep pipelines over input), wb (the wb CLI), vfs_write, and done.

- Use the shell jq command to extract and aggregate JSON.
- Persist any durable finding with wb memory remember <key> <text>.
- When the task is complete, call done with a one-line summary.

Roughly twenty-five lines, and there is nothing else — no surrounding harness, no entry point to register. Point the runtime at this file and it has a worker.

what the runtime actually READS

The parser is one module, Workbooks.AgentDef, and it is 73 lines total. You can hold the entire thing in your head, so let's. It does exactly three things: find the tagged node, read four properties off its drawer, and split the prompt at a heading.

flowchart LR
  org["the org file — plain text"]
  k["oql.wasm
parse_headlines"] find["find the FIRST node
with the agent tag"] rec["%{id, model, toolkits,
tagline, system}"] org --> k k -- "level · title · tags · props · body" --> find find --> rec style org fill:#ffffff,stroke:#121316 style k fill:#aee5c2,stroke:#121316 style find fill:#9fc4e8,stroke:#121316 style rec fill:#9fc4e8,stroke:#121316,stroke-width:2.5px

Read that graph as a short pipeline. Plain org text goes into oql.wasm — the kernel, a WIT-typed WebAssembly component embedded in the runtime at compile time, the very same one that parses workbooks. It returns one map per headline, each carrying level, title, state, id, tags, and props. The parser scans those for the first whose tags contains agent — the same tag-discovery move that finds :toolkit: nodes — and reads four properties off its drawer. The result is a five-field record, and that is everything the runtime knows about your agent.

The mechanics, in the kernel's own terms:

  • Discovery is a tag. parse_headlines(org) then find the node where "agent" in tags. No filename convention, no manifest pointing at it — the tag is the registration.
  • The four properties come straight off the :PROPERTIES: drawer: :ID: becomes the node's id, :MODEL: the model string, :TAGLINE: a one-liner, and :TOOLKITS: a whitespace-split list — so shell wb sandbox parses to three names.
  • A missing :TOOLKITS: is not an error — it defaults to the empty list. A name that isn't installed isn't an error either; you'll see in a moment exactly how it surfaces.

Because the kernel does the parsing, an agent def has no format of its own. It is org, read by the org reader. The thing that makes a workbook legible to a machine is the thing that makes your agent legible too.

the System prompt RULE

The fourth field — system — is the one with a rule worth knowing exactly, because getting it slightly wrong is the difference between your prompt and an accident. The system prompt is everything under a heading matching ** System prompt (any star depth, that exact title), up to the next level-one-or-two heading — a single * or **.

The consequence is the part people miss: deeper *** subsections do not end the prompt — they're part of it, by design. This is what lets a serious def nest whole sections — ground rules, tooling notes, a cadence — as *** children and have all of them land inside one system prompt. Here is the boundary drawn on a real def, cut down to its skeleton:

* agent :agent:            ← parsed: the node, its props only
  :PROPERTIES: ... :END:
  The single agent behind workbooks.sh. Runs every 15 minutes...   ← NOT in the prompt
** System prompt           ← the prompt starts AFTER this line
   You are Waldo — ...
*** Ground rules (absolute)        ← *** stays INSIDE the prompt
*** TOOLING (hard rule)            ← inside
*** ORIENTATION BUDGET             ← inside
* Working notes            ← a * or ** heading ENDS the prompt — this is out

Note what is not in the prompt: the prose under the agent node itself, before ** System prompt. That space is human-facing description — a note to whoever reads the file — and the prompt boundary deliberately steps over it. Put your reader-facing summary there; put the model's instructions under the heading.

the fallback, and the incident behind it

What if there's no ** System prompt heading at all? The parser doesn't run the agent empty — it falls back to the agent node's body: everything after the first :END: drawer line. The code comment naming the reason is unusually candid, and worth quoting because it's a real production scar:

"an agent authored as a plain def shouldn't silently run prompt-less. This bit bit.ml: the crew ran with EMPTY prompts and all imitated desk."

That happened. A set of agents shipped with no recognized prompt heading; the runtime handed each of them an empty system prompt; with nothing to be, they all imitated one of their number, and one fabricated. The fallback exists so that the worst case of a missing heading is the node body, not nothing. The failure modes line up cleanly:

what you wrotewhat the runtime uses as the prompt
** System prompt (exact)the body under it, through any *** children, to the next */**
** System Prompt (capital P)no match — falls back to the node body
** Prompt / any other titleno match — falls back to the node body
no prompt heading anywherethe node body after :END: — better than empty, not what you meant

The lesson is small and absolute: write ** System prompt exactly. The convention is load-bearing beyond this one parser — the Groundskeeper voice agent reads persona from the same heading and raises if it's missing rather than guessing. Treat the heading as part of the grammar, because the system does.

toolkits become a prompt INDEX

This is the page's second surprise, and it's worth lingering on. Declaring :TOOLKITS: git sandbox does not paste two manuals into your prompt. At run time the runtime turns that list into a tiny index and appends it — taglines and skill names, nothing more — and the agent pulls the actual skill bodies on demand. Progressive disclosure, as a parse-time feature.

flowchart LR
  decl[":TOOLKITS: git sandbox"]
  disc["discover toolkits in
$WB_TOOLKITS_ROOT"] idx["## Toolkits block
tagline + first 8 skills + (+N more)"] app["appended LAST:
system + idx"] pull["skill body fetched at run time
wb toolkit show <id> <skill>"] decl --> disc --> idx --> app app -.->|"on demand"| pull style decl fill:#ffffff,stroke:#121316 style disc fill:#aee5c2,stroke:#121316 style idx fill:#f3c5a3,stroke:#121316 style app fill:#9fc4e8,stroke:#121316,stroke-width:2.5px style pull fill:#ffffff,stroke:#121316,stroke-dasharray:4 3

Walk the flow. Your declared list is looked up against the toolkit root — $WB_TOOLKITS_ROOT if it's a real directory, else toolkits/. Each found toolkit becomes one row: its #+TAGLINE from the manifest, plus the first eight skill slugs with a (+N more) overflow if there are extras. That whole block is appended to the prompt as the literal last thing — system <> "\n\n" <> index. The skill bodies themselves are never inlined; the agent reaches for them only when it needs one, through the wb tool. Index in the prompt, manuals on demand.

Here is the real shape of what gets appended — taglines and skills for what's installed, an honest (not installed) row for what isn't:

## Toolkits

You have these toolkits. Before using one, read the relevant skill — call the
`wb` tool: `toolkit show <id> <skill>` (or `toolkit search <query>` to find one).

- git: Host-brokered commit + push for agent workdirs.
  skills: overview, publish-flow
- sandbox: (not installed)

Three things to take from that block. First, the header text teaches the read-deeper move — it names the exact wb command to pull a skill body — so the agent learns how to go deeper without you spelling it out. Second, an undeclared or misspelled toolkit doesn't crash the run; it shows up as (not installed), a visible breadcrumb rather than a stack trace. Third, an empty :TOOLKITS: appends nothing at all — no header, no block — so a toolkit-less agent's prompt is exactly what you wrote and not a byte more.

from parse to a live LOOP

Parsing produces a record; run/3 turns it into a running agent. It assembles the prompt — your system, plus the toolkit index — and hands it to the agent loop along with the task. Two details of that handoff matter to an author.

The model is a default you can override. The :MODEL: from your def is passed as a default, not a mandate — the caller's options win. If nothing names a model at all, the chain falls back to $WB_LLM_MODEL and finally to xiaomi/mimo-v2.5. So you write the model you intend; an operator can swap it per run without touching your file.

The loop is bounded. The agent calls the model, runs any tool calls it asked for, appends the results, and loops — finishing when the model stops calling tools or calls done, capped by max_steps (default 12, and callers raise it for long-horizon work). Every tool call is wall-clock bounded at 150 seconds, and every step is logged to _steps.jsonl and an org event log — the run leaves a fully readable trace. The loop's internals are their own lesson; loops owns that. What an author needs is the tool surface their def gets:

toolwhat it doeswho gets it
shellthe in-WASM shell — cat jq grep sort uniq tr… with pipes and vars, no OS processevery agent
searchsemantic recall over the workdir — the files are the memoryevery agent
wbthe wb CLI, including toolkit show/searchevery agent
fetch · web_searcha GET (HTML-stripped) and a host-brokered keyless searchevery agent
file_issuethe metacognitive seam — files into the autopoet backlogevery agent
vfs_read · vfs_write · doneread/write the workspace; end the runevery agent
git · publish · imagehost-brokered commit+push, publish to the web root, image gen (budget 2/run)only exec: true (trusted) agents

The verdict of that table is the honest reconciliation of the parent lesson's line that an agent gets "exactly one tool: a shell." The shell is the primary tool — the place the agent does its thinking-by-doing. Everything else is the host-brokered membrane around it: ways to read and write the workspace, to commit and publish, to end the run. No path anywhere exposes native OS execution — the old real-bash hatch was deleted on purpose. A trusted def gets git and publish; that's a grant, and it's why you author an exec def like production code.

one def, four DOORS

depth rung · skippable — why a def can assume things about how it's run

The same AgentDef.run/3 is invoked from four places, and knowing them explains a thing that's otherwise mysterious: how a def can say "trust the LIFECYCLE line" when that line appears nowhere in the org file.

flowchart TD
  k["the keeper tick
exec · workdir = tenant git repo
prepends MODE: / LIFECYCLE: to the task"] c["a fleet worker
WB_CREW_DEF manifest · per-member :DEF:"] h["HTTP
POST /api/run · brandnana-ask"] a["the autopoet
supervised, time-bounded"] run["AgentDef.run/3
parse → inject → run"] loop["the Agent loop"] k --> run c --> run h --> run a --> run run --> loop style run fill:#9fc4e8,stroke:#121316,stroke-width:2.5px style loop fill:#aee5c2,stroke:#121316 style k fill:#ffffff,stroke:#121316 style c fill:#ffffff,stroke:#121316 style h fill:#ffffff,stroke:#121316 style a fill:#ffffff,stroke:#121316

Read the diagram as four doors into one room. The keeper tick runs a def on an interval inside a tenant's git working directory, and it prepends lines like MODE: … and LIFECYCLE: wake_audit to the task string before the agent ever sees it — which is why a def can trust those lines as a given, even though it never wrote them. A fleet worker (the WB_CREW_DEF manifest path) runs one member of a multi-agent manifest, each member pointing at its own def file. HTTP runs a def per request, the long-horizon /api/run letting you poll or stream while it works. And the autopoet runs through the exact same path, supervised and time-bounded.

The point for an author: your def is data, and the caller supplies the context — the workdir, the task framing, the trust level. Four very different jobs, one parse, one run path. The cadence and lifecycle machinery belong to orchestration and fleets; here it's enough to know the task arrives pre-framed.

reading a real def: WALDO

The analyst is twenty-five lines. The other end of the honest range is Waldo — the agent behind this project's landing site — at 452 lines. Reading it teaches the real lesson of authoring: a production def is an operations manual, and the *** nesting from the prompt rule is exactly what keeps the whole manual inside one system prompt.

Every one of these is a *** subsection under ** System prompt, so every one of them is in the prompt:

  • Identity and mandate — who it is, what one outcome it grows.
  • Territory map — including explicit HANDS-OFF zones the agent must not touch.
  • The canon — the standards it holds, referenced rather than re-pasted.
  • The staged design process — how a single run is supposed to proceed.
  • Board protocol — a native org TODO board (TODO/NEXT/DOING/DONE) it works from.
  • The cadence — and a commit-tag taxonomy so its history reads cleanly.
  • Ground rules — never fabricate, the prompt-injection boundary, the NO-WORK rule.
  • Tooling — the agent's exact tool surface, documented honestly.
  • Orientation budget — a hard limit on how much it reads before it acts.

That tooling section is the one to dwell on, because it shows a def documenting its own sandbox without flinching:

"NO ls — it is not implemented … READ THE MANIFEST."

Waldo's prompt tells the model the truth about its environment — which commands exist, which don't, where to look instead — rather than letting it discover the gaps by failing. An author's job, at this scale, is less prose and more operating documentation: the agent will believe what the prompt says, so the prompt had better be accurate about the world the agent wakes up in.

One honest seam to flag, because Waldo itself models it: the prompt says the agent "runs every 15 minutes" and "every run dies at a 12-minute wall clock." Neither number lives in the org file — they're deployment config (WB_KEEPER_INTERVAL_MS and the run timeout). The def describes the cadence to the model; the environment enforces it. That's a duplication the author keeps in sync by hand, and it's the cleanest example of the limits we cover next.

the honest LIMITS

The format is intentionally thin, and thin formats have sharp edges. Stated plainly, so nothing surprises you later:

  • The heading typo is silent. ** System Prompt with a capital P, or any other title, doesn't match — and instead of erroring, the runtime falls back to the node body. Better than empty; not what you wrote. This is the single most common way to ship a subtly-wrong agent.
  • :TAGLINE: is parsed but cosmetic. It's read into the record, but the runtime has no consumer for the agent tagline beyond the parse itself — treat it as display metadata, not behavior. (A toolkit's #+TAGLINE is consumed — it's what shows up in the injection index — but that's a different field on a different node.)
  • :TYPE: is ignored. The real analyst.org declares :TYPE: ai and the parser never reads it — there's no type key in the record. Don't lean on it.
  • An undeclared toolkit doesn't fail. A name that isn't installed renders as (not installed) in the index rather than stopping the run — visible, but only if you look.
  • Cadence and timeout live in env, not the def. So the prose in your prompt can drift from reality. Waldo's "every 15 minutes" is a claim; WB_KEEPER_INTERVAL_MS is the fact. The env contract is its own lesson — runtime config.
  • The parser can't gate prompt quality. It reads structure; it cannot make your mandate good. A great def is still mostly prompt craft — analyst's 25 lines and Waldo's 452 are the honest range, and the size tracks the responsibility, not the format.
  • A def is trusted input. An exec: true def gets git and publish — real capabilities. Author it like production code, because in the only sense that matters, it is.

questions people actually ASK

Do I need a ** System prompt heading?

No, but you want one. Without it, the prompt is the agent node's body — everything after the :END: drawer. That fallback exists to keep an agent from running empty, but only the explicit heading survives once you add other prose sections to the file. Write the heading exactly, and your prompt is unambiguous.

Can two :agent: nodes share one file?

Only the first is parsed — parse/1 finds one tagged node and stops. Fleet members live in separate def files, each named by a manifest. One file, one agent, is the working assumption.

How do I override the model without editing the def?

The :MODEL: in your file is a default. A caller's options win over it, and if nothing names a model, the chain falls through $WB_LLM_MODEL to a built-in default. So an operator swaps models per run or per environment without touching your org file.

What happens to a toolkit I didn't declare?

Declaration controls the index appended to your prompt — the compact list of taglines and skills — not raw access. The wb tool can still read any installed toolkit's manifest and skills; declaring one is how you surface it in the prompt, not how you unlock it. (And an undeclared name you do list just shows as "not installed".)

How big should a def be?

As big as the responsibility, no bigger. The analyst is twenty-five lines and does one clean job; Waldo is four-hundred-fifty and runs a whole living site. The format imposes no minimum and no maximum — size tracks what the agent is on the hook for, and most defs sit far closer to the analyst.

Why org, and not YAML or a class?

Because the agent is then legible to the same kernel — and the same eyes — as everything else. No second parser to maintain, no schema to drift, no string literal buried in code. The prompt, the properties, and the structure are all in one plain file you can diff, send, and read aloud.

keep GOING

You can write a def now. These are the neighbors that make it run, schedule, and remember.