the talk outruns the BUILD
Founder speech is the densest spec a company ever produces. The 91-minute voice memo on the drive home, the pitch run-up paced around the kitchen, the meeting where the whole architecture finally clicked out loud — that is the product, described by the one person who holds all of it at once. And it all rots, identically, in a notes app, as a transcript nobody re-reads.
Worse than rot: the talk always outruns the build. Every founder ships sentences before they ship systems — you describe the thing you mean to make in the present tense long before it exists, because that is how you think it through. Which is fine, until nobody tracks which sentences are still fiction. The gap between the spoken system and the shipped system is invisible — right up until a buyer, or a security review, finds it for you and names it in the room.
The usual fix is to talk less, or to caveat everything into mush. Both are losses. The better move is to treat the talk as raw material and build a discipline that knows, line by line, which claims are load-bearing and which are air.
the DEFINITION
1. a load-bearing spoken assertion, extracted from an untouched source as an org TODO item whose state is its honesty — moving from CLAIMED toward a verdict, and reaching a terminal state only with cited evidence: a file, an epic, a test, or a live run.
The move hiding in that definition is the whole lesson: a transcript is a
source, not a document. The working document is the ledger you derive
from it. And because org lets you declare your own TODO
keyword ladder in a single line, the derived document can make honesty
itself into a state machine — the same TODO → DONE you already
trust, but with a conscience bolted on.
seven words of STATE machine
Here is the entire mechanism. One line, at the top of the audit file,
declares a custom keyword ladder — this is real org grammar, the same
#+TODO: line a workflow uses to name its
states:
#+TODO: CLAIMED(c) VALIDATING(v) | KEPT(k) PARTIAL(p) UNKEPT(u) FUTURE(f) RETRACTED(r)
The | is the spine. Left of it are the two working
states — a claim still in flight. Right of it are the five verdicts —
terminal, and each forbidden without a citation. The letters in parentheses
are Emacs fast-selection keys; you press c to stamp a headline
CLAIMED. Read the line left to right and it is a life: a sentence gets pulled
from the recording — CLAIMED — gets checked against the actual repo —
VALIDATING — and lands on exactly one truth.
stateDiagram-v2 [*] --> CLAIMED: extracted from the source CLAIMED --> VALIDATING: checking against the repo VALIDATING --> KEPT: it's true VALIDATING --> PARTIAL: half true VALIDATING --> UNKEPT: said, not built VALIDATING --> FUTURE: not yet, on purpose VALIDATING --> RETRACTED: the claim was wrong note right of VALIDATING: every terminal edge needs
a cited file, epic,
test, or live run
Walk that picture once. A claim enters at CLAIMED and can only leave for
VALIDATING. From VALIDATING there are five exits — KEPT for what is true,
PARTIAL for what is half true, UNKEPT for what was said but not built, FUTURE
for what you deliberately have not built yet, and RETRACTED for a claim that
turned out plain wrong. The note on the side is the rule that gives the whole
diagram its weight: no claim crosses into a verdict without dragging evidence
across with it. The seven words are the same shape as TODO → DONE
— only here, reaching DONE means you proved something.
quote, then EVIDENCE
A claim entry has a fixed anatomy, and the discipline lives in keeping its
two halves in two different voices. The headline carries the status and the
claim as one sentence. The body carries a Quote: — the verbatim
words from the source, never paraphrased — and then an
Evidence: or Reality: line: what is actually true
in the repo. One voice is what was said; the other is what
is. Here is a real KEPT entry from the one live audit:
** KEPT a workbook is an HTML file: CSS + JS + WebAssembly + SQLite, gzip-packed Quote: "if you looked at a workbook, you would at first glance see an HTML file… natively zipped… a SQLite binary file." Evidence: the workbooks-authoring surface ships single-file .html mini-apps bundling a WASM runtime + SQLite + their own source.
The quote is the founder, word for word. The evidence is the repo, answering. They are allowed to disagree — that disagreement is the entire point of the document — so they must never be written by the same hand in the same breath. The fields and who fills them:
| field | whose voice | what it must cite |
|---|---|---|
| headline — status + claim | the auditor | one verdict word from the ladder |
| Quote: | the source, verbatim | the untouched transcript — nothing invented |
| Evidence: / Reality: | the repo | a file, an epic, a test, or a live run |
The verbatim rule is not pedantry. The moment you paraphrase the quote, you have quietly edited the claim toward the version you can defend — and the audit becomes a record of what you wish you had said. The source stays untouched precisely so the gap can stay honest.
the five VERDICTS
Depth rung — skippable. The ladder earns its keep on the right-hand side, so here is one real specimen of each terminal verdict, pulled straight from the audit of the 91-minute memo. Each pairs the claim with the evidence its verdict is allowed to cite:
| verdict | a real claim | its cited reality |
|---|---|---|
| KEPT | compilers run inside the sandbox — C, Zig, Rust, Go, JS/TS | epic wb-zyl done; the compilers ghcr package; recipes under runtime/compilers/ |
| PARTIAL | bash is emulated in WebAssembly, no native exec anywhere | the posix shape and the no-native-exec invariant exist — but a hardened interim real-exec tool still does too (epic wb-9ja removes it) |
| UNKEPT | most of Python compiles and runs in the sandbox | there is no Python lane in the compiler set — clang / mrustc / zig / go-yaegi / quickjs. Build it or stop saying it |
| FUTURE | a non-LLM, state-space model trained on runtime telemetry diffuses changes into the engine | deliberately unbuilt — named, scoped, and parked as direction, not pretended as shipped |
| RETRACTED | Erlang gives nine-9s fault tolerance — 99.999999999% | folklore from one Ericsson anecdote, widely contested — recommend dropping the number, keeping the true supervision-tree story |
The UNKEPT and the RETRACTED are the ones that earn the discipline. The Python claim is the kind of sentence that sounds true because the sentence next to it is true — and an audit is the only thing that catches it before a customer does. The nine-9s line is sharper still: not half-built, but wrong — and the verdict's job is to say keep the story, drop the number, rather than quietly defend an indefensible statistic.
audio in, ledger OUT
The ledger is the end of a pipeline, and every step of it is agent-runnable — which is exactly why the whole method is a candidate toolkit: audio goes in, a claims ledger comes out. The real path that produced the one live audit:
flowchart LR rec["91-min voice memo
5454.4 sec"] --> stt["ElevenLabs
scribe_v1"] stt --> src subgraph src["sources/ — untouched, append-only"] v["verbatim.txt
8,578 words"] w["words.json
per-word timing + logprobs"] end src --> audit["audits/spoken-thesis.org
claims, evidence-checked"] audit --> back["backlog
every UNKEPT / PARTIAL"] audit -. "#+SOURCE: points back" .-> src style src fill:#fbfaf6,stroke:#121316 style audit fill:#f2ddb0,stroke:#121316,stroke-width:2.5px style back fill:#13d943,stroke:#121316
Read the flow left to right. A 91-minute memo — 5,454 seconds — goes
through ElevenLabs scribe_v1 and lands as two untouched files in a dated
source directory: verbatim.txt, exactly 8,578 words, and
words.json, which carries every word with its start time, end
time, and the model's log-probability — the transcript knows not just what
was said but when, and how sure it was. From those sources the audit
is derived, and its #+SOURCE: line points back at them — the
dotted arrow. The sources are never edited; the audit is the only thing that
changes. And the output edge on the right is the payoff: every UNKEPT and
every PARTIAL falls straight into a backlog.
The append-only rule on sources/ is what makes the whole
thing trustworthy over time. You can re-audit the same memo a year from now
and the words you are checking against have not drifted. The document moves;
the truth it was built from does not.
the gap is the BACKLOG
This is the thesis the audit states about itself, verbatim: the gap between the spoken system and the shipped system is the backlog. An UNKEPT is not a confession — it is a ticket. The method does not stop at judging; it dispatches. Here is the loop that actually runs on this project, from a claim to filed work:
sequenceDiagram participant F as founder participant G as groundskeeper agent participant A as author model (mercury-2) participant W as workflow run participant L as the ledger F->>G: on a live call — name a weak claim G->>G: capture to sources/captures/.org G->>A: dispatch a research goal A->>W: author an org TODO outline (:done-when: gates) W->>L: results land on TASKS.org L->>F: next call picks up where this one left off
Trace that exchange. On a live call the founder names a claim that worries
him — in the real capture file, workbook-as-container is flagged as
the weakest pitch line. The groundskeeper agent saves
the thought to a dated capture file and dispatches a research goal. An author
model — inception/mercury-2 — writes that goal up as an org TODO
outline, where each leaf carries a :done-when: shell gate that
must pass for the step to count as done. The run executes, and its results
land on TASKS.org, the task ledger generated from the runtime
and never hand-edited. The next call picks up from there.
The loop closes most sharply on the scariest CLAIMED in the whole audit — thousands of agents per server, insanely high concurrency, whose cited reality is brutal: the first live co-tenant run overloaded a four-agent box. On a real archived call the founder converted that worry into work by voice, in one sentence — file an issue titled load benchmark for concurrent agents, description: we claim thousands of agents per server with no proof, build a benchmark. The agent's reply: issue filed. One claim, one verdict still pending, one benchmark now on the backlog — the method, end to end, by talking.
a ledger you can GREP
Depth rung — skippable. Because every status is a headline keyword in a
plain text file, the health of your pitch is one shell command away. You do
not need a dashboard; you need grep:
$ grep -c '^\*\* UNKEPT' audits/spoken-thesis.org # 1 — said, not built
$ grep -c '^\*\* CLAIMED' audits/spoken-thesis.org # 6 — still unchecked
$ grep '^\*\* ' audits/spoken-thesis.org | awk '{print $2}' | sort | uniq -c
# KEPT 7 · PARTIAL 3 · CLAIMED 6 · UNKEPT 1 · FUTURE 1
Those are the real counts from the one live ledger. They tell you something the prose can't: this audit is ahead of its own data. The ladder declares RETRACTED and VALIDATING, yet neither has a single live instance — six claims still sit unchecked at CLAIMED, and the recommended retraction of the nine-9s line has not been stamped RETRACTED yet, only argued for. That is honest, and grep is what makes it visible. Pitch health becomes arithmetic: count the UNKEPTs, list every claim still missing evidence, diff this week's audit against last week's. The discipline is legible to a human, an agent, and a one-line script at the same time.
more than a CHECKLIST
Depth rung — skippable. A claims audit is not just a column of verdicts. The same pass over a single 91-minute ramble extracts a full working document. From the one real audit:
| also extracted | what it captures |
|---|---|
| the argument skeleton | the pitch rebuilt as a numbered structure — the spine the claims hang on |
| the audiences | who each part of the talk is actually for |
| the contrast objects | named competitors and foils — E2B, Daytona, Firecracker, StackBlitz, Letta, Tauri |
| the open questions | the dangling threads the founder left unresolved out loud |
One unscripted ramble, run through this method, yields a checked claims ledger, a positioning map, a competitor list, and a question backlog — a working document, not a checklist. The transcript was never the deliverable. The audit is.
where it BITES
Honesty section — and a claims lesson that hid its own UNKEPTs would be a bad joke. Four places this bites.
First, status is judgment, not proof. Someone — a human or an agent — reads the repo and decides KEPT or PARTIAL. The evidence rule constrains that judgment but does not automate it; a generous auditor can still stamp KEPT on a half-truth. The discipline raises the cost of self-deception; it does not abolish it.
Second, KEPT rots. Evidence ages. A claim that was true against last quarter's repo can quietly go stale, and a verdict carries no expiry date. A KEPT is a snapshot, not a guarantee — which is why re-auditing is part of the method, not a failure of it.
Third — and this one is precise — the runtime can't execute this exact
ladder yet. The workflow engine reads a file's
own #+TODO: line, but its parser does not strip the Emacs
fast-selection keys: it reads KEPT(k) as one token, so a
headline stamped ** KEPT does not match and parses as no state at
all. Emacs org-mode itself strips the (k); our runtime parser
doesn't. So today the claims ladder is fully readable by humans, agents, and
grep — but it is not executable workflow state in the engine. That
is a real gap, named here because the method demands it.
Fourth, n equals one. One 91-minute memo has been audited this way.
And the drift validator the method implies — a general checker that reads an
org claims file and flags a verdict without evidence — does not exist yet.
wb content check validates a CMS content tree, and
:done-when: gates are unit tests for org workflows, but a generic
claims-drift linter is itself on the backlog this method generated. The
method's sharpest promise is one of its own UNKEPT items. We would rather tell
you that than imply otherwise.
questions people actually ASK
Isn't this just fact-checking the pitch?
No — fact-checking ends at a verdict column. This doesn't. VALIDATING and the verdicts feed a backlog: an UNKEPT is a ticket, not a scarlet letter, and the loop dispatches research that closes it. The output isn't a grade, it's work.
Who assigns the status?
Whoever can cite evidence — a human or an agent, it doesn't matter. The rule isn't about authority, it's about the citation: no terminal status crosses into the file without a file, an epic, a test, or a live run behind it.
Why org and not a spreadsheet?
Because the ladder lives in the same grammar as the workflows it spawns. A claim is an org TODO; the research that closes it is an org TODO; the task ledger is org. A spreadsheet would make the audit a fifth system of record — the exact fragmentation org is here to end.
Can I retract a KEPT?
Yes, and you should when evidence ages. A verdict is a snapshot of a moment, not a permanent ruling. Re-audit; if the repo moved out from under a KEPT, move the claim back to VALIDATING and check it again.
Do I need the runtime to do this?
No. The whole method runs on a transcript and a text file. The eight-word
#+TODO: line, the quote-then-evidence anatomy, and the
evidence-or-no-verdict rule are a discipline, not a product — you can audit
your own last pitch tonight in any editor.
Where does an UNKEPT claim actually go?
Into the backlog seam the autopoet reads — the
same file_issue lane a working agent uses to flag its own
gaps. An UNKEPT spoken in a pitch and a bug filed by an agent mid-task land
in the same place: work, in the declarative layer.
keep GOING
Claims are the sharpest argument the grammar makes — here is where it comes from and what it feeds.