your agent is a config file in someone's FRAMEWORK
Every agent framework makes defining an agent a programming task. A Python class you subclass. A YAML schema with a validator. A builder API with a dozen chained calls. A vendor config format only that one product reads. The definition lives in code you have to deploy, in a shape nothing else can parse, and the most failure-prone part of the whole thing — the system prompt — is usually a string literal three imports deep, invisible in any diff that matters.
None of that is inspectable, and none of it is portable. You can't read a competitor's agent, you can't hand yours to a teammate as a file, and you certainly can't ask the same tools you use for everything else to make sense of it. The definition is trapped inside the framework that runs it.
Here it isn't. An agent definition is a plain org file — the same grammar as everything else in the ecosystem. The runtime that turns it into a worker is small enough to read in one sitting. This lesson is that file, and that runtime, with nothing hidden.
the DEFINITION
1. one org file describing a
runnable agent: the first headline tagged
:agent:, a :PROPERTIES: drawer carrying
:ID: :MODEL: :TOOLKITS:
:TAGLINE:, and a ** System prompt heading whose body
becomes the prompt. Parsed by the same kernel that parses workbooks —
no bespoke format, no registration, no build step.
That's the whole contract. There is no second parser, no schema file, no decorator. The agent is discovered like a toolkit — by a tag — and run on the same clean-room substrate as everything else. Here is a complete one that really runs, top to bottom:
#+TITLE: analyst agent * Data Analyst :agent: :PROPERTIES: :ID: analyst :MODEL: xiaomi/mimo-v2.5 :TAGLINE: Processes JSON data via in-WASM commands and persists findings. :TOOLKITS: shell wb sandbox :END: ** System prompt You are *analyst*, a clean-room runtime agent. You have tools: shell (run jq/grep pipelines over input), wb (the wb CLI), vfs_write, and done. - Use the shelljqcommand to extract and aggregate JSON. - Persist any durable finding withwb memory remember <key> <text>. - When the task is complete, call done with a one-line summary.
Roughly twenty-five lines, and there is nothing else — no surrounding harness, no entry point to register. Point the runtime at this file and it has a worker.
what the runtime actually READS
The parser is one module, Workbooks.AgentDef, and it is
73 lines total. You can hold the entire thing in your head, so let's. It
does exactly three things: find the tagged node, read four properties off its
drawer, and split the prompt at a heading.
flowchart LR org["the org file — plain text"] k["oql.wasm
parse_headlines"] find["find the FIRST node
with the agent tag"] rec["%{id, model, toolkits,
tagline, system}"] org --> k k -- "level · title · tags · props · body" --> find find --> rec style org fill:#ffffff,stroke:#121316 style k fill:#aee5c2,stroke:#121316 style find fill:#9fc4e8,stroke:#121316 style rec fill:#9fc4e8,stroke:#121316,stroke-width:2.5px
Read that graph as a short pipeline. Plain org text goes into
oql.wasm — the kernel, a WIT-typed WebAssembly component embedded in
the runtime at compile time, the very same one that parses
workbooks. It returns one map per headline, each carrying
level, title, state, id,
tags, and props. The parser scans those for the first
whose tags contains agent — the same tag-discovery move
that finds :toolkit: nodes — and reads four properties off its
drawer. The result is a five-field record, and that is everything the runtime
knows about your agent.
The mechanics, in the kernel's own terms:
- Discovery is a tag.
parse_headlines(org)then find the node where"agent" in tags. No filename convention, no manifest pointing at it — the tag is the registration. - The four properties come straight off the
:PROPERTIES:drawer::ID:becomes the node's id,:MODEL:the model string,:TAGLINE:a one-liner, and:TOOLKITS:a whitespace-split list — soshell wb sandboxparses to three names. - A missing
:TOOLKITS:is not an error — it defaults to the empty list. A name that isn't installed isn't an error either; you'll see in a moment exactly how it surfaces.
Because the kernel does the parsing, an agent def has no format of its own. It is org, read by the org reader. The thing that makes a workbook legible to a machine is the thing that makes your agent legible too.
the System prompt RULE
The fourth field — system — is the one with a rule worth knowing
exactly, because getting it slightly wrong is the difference between your prompt
and an accident. The system prompt is everything under a heading matching
** System prompt (any star depth, that exact title), up to the next
level-one-or-two heading — a single * or **.
The consequence is the part people miss: deeper *** subsections do
not end the prompt — they're part of it, by design. This is what lets a
serious def nest whole sections — ground rules, tooling notes, a cadence — as
*** children and have all of them land inside one system prompt. Here
is the boundary drawn on a real def, cut down to its skeleton:
* agent :agent: ← parsed: the node, its props only :PROPERTIES: ... :END: The single agent behind workbooks.sh. Runs every 15 minutes... ← NOT in the prompt ** System prompt ← the prompt starts AFTER this line You are Waldo — ... *** Ground rules (absolute) ← *** stays INSIDE the prompt *** TOOLING (hard rule) ← inside *** ORIENTATION BUDGET ← inside * Working notes ← a * or ** heading ENDS the prompt — this is out
Note what is not in the prompt: the prose under the agent node itself,
before ** System prompt. That space is human-facing description — a
note to whoever reads the file — and the prompt boundary deliberately steps over
it. Put your reader-facing summary there; put the model's instructions under the
heading.
the fallback, and the incident behind it
What if there's no ** System prompt heading at all? The parser
doesn't run the agent empty — it falls back to the agent node's body:
everything after the first :END: drawer line. The code comment naming
the reason is unusually candid, and worth quoting because it's a real production
scar:
"an agent authored as a plain def shouldn't silently run prompt-less. This bit bit.ml: the crew ran with EMPTY prompts and all imitated desk."
That happened. A set of agents shipped with no recognized prompt heading; the runtime handed each of them an empty system prompt; with nothing to be, they all imitated one of their number, and one fabricated. The fallback exists so that the worst case of a missing heading is the node body, not nothing. The failure modes line up cleanly:
| what you wrote | what the runtime uses as the prompt |
|---|---|
** System prompt (exact) | the body under it, through any *** children, to the next */** |
** System Prompt (capital P) | no match — falls back to the node body |
** Prompt / any other title | no match — falls back to the node body |
| no prompt heading anywhere | the node body after :END: — better than empty, not what you meant |
The lesson is small and absolute: write ** System prompt exactly.
The convention is load-bearing beyond this one parser — the
Groundskeeper voice agent reads persona from the same heading
and raises if it's missing rather than guessing. Treat the heading as part
of the grammar, because the system does.
toolkits become a prompt INDEX
This is the page's second surprise, and it's worth lingering on. Declaring
:TOOLKITS: git sandbox does not paste two manuals into your
prompt. At run time the runtime turns that list into a tiny index and
appends it — taglines and skill names, nothing more — and the agent pulls the
actual skill bodies on demand. Progressive disclosure, as a parse-time feature.
flowchart LR decl[":TOOLKITS: git sandbox"] disc["discover toolkits in
$WB_TOOLKITS_ROOT"] idx["## Toolkits block
tagline + first 8 skills + (+N more)"] app["appended LAST:
system + idx"] pull["skill body fetched at run time
wb toolkit show <id> <skill>"] decl --> disc --> idx --> app app -.->|"on demand"| pull style decl fill:#ffffff,stroke:#121316 style disc fill:#aee5c2,stroke:#121316 style idx fill:#f3c5a3,stroke:#121316 style app fill:#9fc4e8,stroke:#121316,stroke-width:2.5px style pull fill:#ffffff,stroke:#121316,stroke-dasharray:4 3
Walk the flow. Your declared list is looked up against the toolkit root —
$WB_TOOLKITS_ROOT if it's a real directory, else toolkits/.
Each found toolkit becomes one row: its #+TAGLINE from the manifest,
plus the first eight skill slugs with a (+N more) overflow if
there are extras. That whole block is appended to the prompt as the literal last
thing — system <> "\n\n" <> index. The skill bodies themselves
are never inlined; the agent reaches for them only when it needs one, through the
wb tool. Index in the prompt, manuals on demand.
Here is the real shape of what gets appended — taglines and skills for what's
installed, an honest (not installed) row for what isn't:
## Toolkits You have these toolkits. Before using one, read the relevant skill — call the `wb` tool: `toolkit show <id> <skill>` (or `toolkit search <query>` to find one). - git: Host-brokered commit + push for agent workdirs. skills: overview, publish-flow - sandbox: (not installed)
Three things to take from that block. First, the header text teaches the
read-deeper move — it names the exact wb command to pull a skill
body — so the agent learns how to go deeper without you spelling it out. Second, an
undeclared or misspelled toolkit doesn't crash the run; it shows up as
(not installed), a visible breadcrumb rather than a stack trace.
Third, an empty :TOOLKITS: appends nothing at all — no
header, no block — so a toolkit-less agent's prompt is exactly what you wrote and
not a byte more.
from parse to a live LOOP
Parsing produces a record; run/3 turns it into a running agent. It
assembles the prompt — your system, plus the toolkit index — and hands
it to the agent loop along with the task. Two details of that handoff matter to an
author.
The model is a default you can override. The :MODEL: from your
def is passed as a default, not a mandate — the caller's options win. If
nothing names a model at all, the chain falls back to $WB_LLM_MODEL and
finally to xiaomi/mimo-v2.5. So you write the model you intend; an
operator can swap it per run without touching your file.
The loop is bounded. The agent calls the model, runs any tool calls it
asked for, appends the results, and loops — finishing when the model stops calling
tools or calls done, capped by max_steps (default 12, and
callers raise it for long-horizon work). Every tool call is wall-clock bounded at
150 seconds, and every step is logged to _steps.jsonl and an org
event log — the run leaves a fully readable trace. The loop's internals are their
own lesson; loops owns that. What an author needs is the tool
surface their def gets:
| tool | what it does | who gets it |
|---|---|---|
shell | the in-WASM shell — cat jq grep sort uniq tr… with pipes and vars, no OS process | every agent |
search | semantic recall over the workdir — the files are the memory | every agent |
wb | the wb CLI, including toolkit show/search | every agent |
fetch · web_search | a GET (HTML-stripped) and a host-brokered keyless search | every agent |
file_issue | the metacognitive seam — files into the autopoet backlog | every agent |
vfs_read · vfs_write · done | read/write the workspace; end the run | every agent |
git · publish · image | host-brokered commit+push, publish to the web root, image gen (budget 2/run) | only exec: true (trusted) agents |
The verdict of that table is the honest reconciliation of the
parent lesson's line that an agent gets "exactly one tool: a
shell." The shell is the primary tool — the place the agent does its
thinking-by-doing. Everything else is the host-brokered membrane around it: ways to
read and write the workspace, to commit and publish, to end the run. No path
anywhere exposes native OS execution — the old real-bash hatch was deleted on
purpose. A trusted def gets git and publish; that's a grant,
and it's why you author an exec def like production code.
one def, four DOORS
depth rung · skippable — why a def can assume things about how it's run
The same AgentDef.run/3 is invoked from four places, and knowing them
explains a thing that's otherwise mysterious: how a def can say "trust the
LIFECYCLE line" when that line appears nowhere in the org file.
flowchart TD k["the keeper tick
exec · workdir = tenant git repo
prepends MODE: / LIFECYCLE: to the task"] c["a fleet worker
WB_CREW_DEF manifest · per-member :DEF:"] h["HTTP
POST /api/run · brandnana-ask"] a["the autopoet
supervised, time-bounded"] run["AgentDef.run/3
parse → inject → run"] loop["the Agent loop"] k --> run c --> run h --> run a --> run run --> loop style run fill:#9fc4e8,stroke:#121316,stroke-width:2.5px style loop fill:#aee5c2,stroke:#121316 style k fill:#ffffff,stroke:#121316 style c fill:#ffffff,stroke:#121316 style h fill:#ffffff,stroke:#121316 style a fill:#ffffff,stroke:#121316
Read the diagram as four doors into one room. The keeper tick runs a def on
an interval inside a tenant's git working directory, and it prepends lines
like MODE: … and LIFECYCLE: wake_audit to the task string
before the agent ever sees it — which is why a def can trust those lines as a given,
even though it never wrote them. A fleet worker (the WB_CREW_DEF
manifest path) runs one member of a multi-agent manifest, each member pointing at its
own def file. HTTP runs a def per request, the
long-horizon /api/run letting you poll or stream while it works. And the
autopoet runs through the exact same path, supervised and time-bounded.
The point for an author: your def is data, and the caller supplies the context — the workdir, the task framing, the trust level. Four very different jobs, one parse, one run path. The cadence and lifecycle machinery belong to orchestration and fleets; here it's enough to know the task arrives pre-framed.
reading a real def: WALDO
The analyst is twenty-five lines. The other end of the honest range is Waldo — the
agent behind this project's landing site — at 452 lines. Reading it teaches the
real lesson of authoring: a production def is an operations manual, and the
*** nesting from the prompt rule is exactly what keeps the whole manual
inside one system prompt.
Every one of these is a *** subsection under ** System prompt,
so every one of them is in the prompt:
- Identity and mandate — who it is, what one outcome it grows.
- Territory map — including explicit HANDS-OFF zones the agent must not touch.
- The canon — the standards it holds, referenced rather than re-pasted.
- The staged design process — how a single run is supposed to proceed.
- Board protocol — a native org TODO board (TODO/NEXT/DOING/DONE) it works from.
- The cadence — and a commit-tag taxonomy so its history reads cleanly.
- Ground rules — never fabricate, the prompt-injection boundary, the NO-WORK rule.
- Tooling — the agent's exact tool surface, documented honestly.
- Orientation budget — a hard limit on how much it reads before it acts.
That tooling section is the one to dwell on, because it shows a def documenting its own sandbox without flinching:
"NO ls — it is not implemented … READ THE MANIFEST."
Waldo's prompt tells the model the truth about its environment — which commands exist, which don't, where to look instead — rather than letting it discover the gaps by failing. An author's job, at this scale, is less prose and more operating documentation: the agent will believe what the prompt says, so the prompt had better be accurate about the world the agent wakes up in.
One honest seam to flag, because Waldo itself models it: the prompt says the agent
"runs every 15 minutes" and "every run dies at a 12-minute wall clock." Neither number lives
in the org file — they're deployment config (WB_KEEPER_INTERVAL_MS and the run
timeout). The def describes the cadence to the model; the environment enforces
it. That's a duplication the author keeps in sync by hand, and it's the cleanest example of
the limits we cover next.
the honest LIMITS
The format is intentionally thin, and thin formats have sharp edges. Stated plainly, so nothing surprises you later:
- The heading typo is silent.
** System Promptwith a capital P, or any other title, doesn't match — and instead of erroring, the runtime falls back to the node body. Better than empty; not what you wrote. This is the single most common way to ship a subtly-wrong agent. :TAGLINE:is parsed but cosmetic. It's read into the record, but the runtime has no consumer for the agent tagline beyond the parse itself — treat it as display metadata, not behavior. (A toolkit's#+TAGLINEis consumed — it's what shows up in the injection index — but that's a different field on a different node.):TYPE:is ignored. The real analyst.org declares:TYPE: aiand the parser never reads it — there's notypekey in the record. Don't lean on it.- An undeclared toolkit doesn't fail. A name that isn't installed renders as
(not installed)in the index rather than stopping the run — visible, but only if you look. - Cadence and timeout live in env, not the def. So the prose in your prompt can drift
from reality. Waldo's "every 15 minutes" is a claim;
WB_KEEPER_INTERVAL_MSis the fact. The env contract is its own lesson — runtime config. - The parser can't gate prompt quality. It reads structure; it cannot make your mandate good. A great def is still mostly prompt craft — analyst's 25 lines and Waldo's 452 are the honest range, and the size tracks the responsibility, not the format.
- A def is trusted input. An
exec: truedef getsgitandpublish— real capabilities. Author it like production code, because in the only sense that matters, it is.
questions people actually ASK
Do I need a ** System prompt heading?
No, but you want one. Without it, the prompt is the agent node's body — everything after the
:END: drawer. That fallback exists to keep an agent from running empty, but only the
explicit heading survives once you add other prose sections to the file. Write the heading exactly,
and your prompt is unambiguous.
Can two :agent: nodes share one file?
Only the first is parsed — parse/1 finds one tagged node and stops.
Fleet members live in separate def files, each named by a manifest. One file, one agent, is the working
assumption.
How do I override the model without editing the def?
The :MODEL: in your file is a default. A caller's options win over it, and if nothing
names a model, the chain falls through $WB_LLM_MODEL to a built-in default. So an
operator swaps models per run or per environment without touching your org file.
What happens to a toolkit I didn't declare?
Declaration controls the index appended to your prompt — the compact list of taglines and
skills — not raw access. The wb tool can still read any installed toolkit's manifest and
skills; declaring one is how you surface it in the prompt, not how you unlock it. (And an undeclared
name you do list just shows as "not installed".)
How big should a def be?
As big as the responsibility, no bigger. The analyst is twenty-five lines and does one clean job; Waldo is four-hundred-fifty and runs a whole living site. The format imposes no minimum and no maximum — size tracks what the agent is on the hook for, and most defs sit far closer to the analyst.
Why org, and not YAML or a class?
Because the agent is then legible to the same kernel — and the same eyes — as everything else. No second parser to maintain, no schema to drift, no string literal buried in code. The prompt, the properties, and the structure are all in one plain file you can diff, send, and read aloud.
keep GOING
You can write a def now. These are the neighbors that make it run, schedule, and remember.