memory is always a SECOND system
Every agent memory you've seen is a bolt-on. A vector database the agent writes embeddings into. A "memory" API with its own schema and its own retention rules. A store that slowly drifts away from what's actually true, that nobody can open and read, and that dies the day the vendor does. The shape is always the same: the work lives in one place and the memory of the work lives in another, and nothing structural connects them.
The cost shows up the moment you want to check the agent. When it forgets, misremembers, or claims it did something it didn't, there's no artifact to diff against — the memory is a black box with an embedding index inside it. You can ask it what it remembers; you can't audit what it remembers. The most context-hungry collaborators we've ever had arrived, and the industry's answer was to give them a memory nobody else can read.
This page is one answer to that, and it's almost embarrassingly plain: delete the second system entirely. Let the agent's memory be the one thing every developer already knows how to read, diff, and revert — a directory of files under version control.
the DEFINITION
1. an agent's working directory that is also
a git repository of plain files — where the files are the
memory and the commits are the changelog. The agent remembers by
writing a file, recalls by embedding the files it has now, and reports what
it did with git log. No separate store.
The agent's own search tool says it in the runtime, verbatim:
recall by meaning — semantic search over the org/code files in your working
context; no separate memory store; the files ARE the memory. To remember
something, write it as an org file — it becomes searchable automatically.
That isn't marketing copy. It's the tool description the model reads at
runtime, and the rest of this lesson is the machinery that makes it true.
one tree, five WRITERS
The repo is one directory — WB_DATA/<tenant>, a real git
repo the runtime git inits the first time an agent runs there. But
five different writers contribute to it, each owning its own paths, and the
whole design rests on keeping those territories clear. Who writes what:
| path | written by | read by | committed? |
|---|---|---|---|
plan.org · content/ · blog/ | the agent | everyone — peers, humans, its own next run | yes |
src/ shell · the agent def · design.org · skills/ | the team (humans + CI) | the agent, as read-only canon | yes |
rem/*.org · rem/manifest.json | the dream process (a separate sleep model) | the agent's next run, at step 0 | yes |
_steps.jsonl · /events.org | the runtime (automatically, every tool call) | the dream process; humans auditing | yes |
.gitignore · .workbooks/<tenant>.ed25519 · session data | the runtime | the host only | never (gitignored) |
Two facts in that table do a lot of work. First, the agent's territory is
prose-law in its own definition — your territory: src/sections/grown/,
the content/** partials, blog/, plan.org,
strategy/ — nothing else. The boundary isn't enforced by the
runtime; it's written into the agent and visible in the diff when it's
crossed. Second, the bottom row never enters version control: the per-tenant
Ed25519 keypair that backs the agent's did:key identity lives
inside the same tree but stays untracked. The .gitignore
the runtime writes is doing double duty — it's the privacy boundary and
the share boundary, because packing the repo for anyone else drops exactly
the parts git check-ignore flags. The agent's native git instinct
is the security model.
And the def itself — the thing that decides who the agent is — is just
another file in this tree, written in the same org grammar
as everything else: an :agent: node with :MODEL: and
:TOOLKITS: properties and a ** System prompt heading.
No bespoke format. The lander's def declares
:MODEL: anthropic/claude-opus-4.8,
:TOOLKITS: git sandbox — and states the whole thesis in one
line: no database, no CMS framework: files in this repo ARE the CMS.
the board: PLAN.ORG
The agent doesn't keep its plan in a context window that vanishes when the
run ends. It keeps it in a file, plan.org, in the same repo as
everything else — and that file is a workflow in
exactly the sense the parent lesson means. Here is a real board, the actual
genesis file the living-lander agent started from:
#+TODO: TODO NEXT DOING | DONE CANCELLED * board ** TODO objective: know the landscape *** NEXT strategy: initial landscape report — AI app builders cohort (Lovable, v0, Cursor, Bolt, Replit) *** TODO strategy: internal-tools cohort (hex.tech, Retool, Airtable, PowerApps) ** TODO objective: page completeness *** TODO 99-faq.svelte — honest faq (faqs live last; source-only commit) * log - 2026-06-10 (team): genesis — fresh history, Svelte source rail, roadmap seeded
The first line declares the state set, and it isn't decoration: the
runtime's workflow engine parses exactly these keywords —
TODO NEXT WAITING DOING STARTED BLOCKED as live states,
DONE CANCELLED CANCELED as done. The agent doesn't invent a
status model; it writes into one the engine already understands.
One task's whole life is a sequence of state changes in place. On
a run, the agent applies the newest dream's verdicts mechanically (more on
that below), picks the first NEXT task — or the top
TODO if none — marks it DOING, does the work, and
marks it DONE in the same commit that ships the work:
flowchart LR todo["TODO — backlog"] next["NEXT — promoted by a dream verdict"] doing["DOING — claimed this run"] done["DONE — shipped, same commit"] cancel["CANCELLED — dropped"] todo --> next --> doing --> done todo -. "pick up: …" .-> next todo -. "cancel" .-> cancel next -. "cancel" .-> cancel style done fill:#13d943,stroke:#121316,stroke-width:2.5px style doing fill:#9fc4e8,stroke:#121316 style cancel fill:#d9dbd3,stroke:#121316 style todo fill:#ffffff,stroke:#121316 style next fill:#ffffff,stroke:#121316
So the genesis line *** NEXT strategy: initial landscape report…
becomes *** DOING strategy: initial landscape report… when the
agent claims it, then *** DONE strategy: initial landscape report…
in the commit that ships the report — and the * log gains one
dated dash. The law in the def is blunt: state changes ARE the workflow;
never delete tasks, move their state. A deleted task leaves no trace; a
moved state is a diff. The whole board is therefore replayable from
git log — you can watch the plan think.
In a fleet, claiming is a convention, not a lock. An agent grabbing
a task writes the state change plus an :AGENT: property and
commits it before doing the work, so peers see the claim in git. The
runtime never locks a task — it only isolates runs. The coordination is
social, and the medium is the same shared repo.
recall without a STORE
Depth rung. If the files are the memory, how does the agent find
the right one among hundreds? Not with a stored index. The search
tool is stateless: on every single call it globs the working directory,
chunks the current files, embeds them, ranks by cosine similarity, and returns
the top five. There is no index sitting on disk to fall out of date — which is
the entire point. You search the actual files the agent is working in right
now, so results always reflect the current truth. There is no second
note store to drift, because there is no note store at all.
flowchart LR q["query — recall by meaning"] glob["glob the workdir
(.org .md .ex .svelte .js …)"] chunk{"chunk by kind"} org["org files: one chunk
per heading section"] txt["other text: 20-line windows"] embed["embed query + every chunk
(no stored vectors)"] rank["cosine rank → top 5
path · headline · 240-char snippet"] q --> embed glob --> chunk chunk --> org --> embed chunk --> txt --> embed embed --> rank style q fill:#9fc4e8,stroke:#121316,stroke-width:2.5px style rank fill:#13d943,stroke:#121316 style embed fill:#ffffff,stroke:#121316
The chunking is org-aware, and that's not incidental. An .org
file chunks into one chunk per heading section — the headline is
extracted, the tags stripped — so a search hit lands on a whole coherent
thought, not a window that happens to straddle two ideas. Every other text
file chunks into plain 20-line windows. Either way, embedding happens at query
time over what's there now.
This is why "to remember, write an org file" is literally true. There's no
separate "save to memory" verb. You vfs_write a section into an
org file and it is, by that act, searchable — it joins the corpus the next
query embeds. Memory has exactly three verbs and you already know all of them:
read a file, search the files by meaning, and git log
the history. There is a heavier, indexed search lane in the runtime —
hybrid vector-plus-literal fusion over a stored index — but it's deliberately
reserved for stored workbooks, not the agent's in-run
recall, precisely to avoid the drift a stored index introduces.
commit means LIVE
The agent never runs git. Its entire git capability is a single
host-brokered tool, granted only when the agent carries the
exec: true trust flag — and even then, the agent supplies one
thing only: the commit message. Here is the whole call the model makes:
{"name": "git", "arguments": {
"message": "add: comparison section — a page that dies vs a page that lives"
}}
The host owns the command line. On that one call it runs, in order:
ensure the .gitignore, git add -A, commit with hooks
disabled and the tenant's identity (<tenant>@workbooks.local),
publish content/** and blog/** to the live
site root, then git push origin HEAD. The tool output the model
sees is one line:
committed + pushed: 3fa9c12 (pushed) (published 2)
sequenceDiagram
participant A as agent (picks the message)
participant H as host (picks the command line)
participant G as git
participant S as SitePublish
participant L as live site + public timeline
A->>H: git { message }
H->>G: add -A · commit · push origin HEAD
H->>S: publish content/** + blog/** → live root
S->>L: mirror to the served tree
G->>L: sha lands on the public changelog
H-->>A: 3fa9c12 (pushed) (published 2)
That walkthrough's spine is the atomicity. Commit implies publish, in the
same call — and that's not tidiness, it's a scar. The lander once shipped
blogs that were committed but returned 404, because the run died before a
separate publish step ever ran. So the seam was redrawn: there is no separate
publish step anymore. If it's committed, it's live. The failure shapes the
model might instead see are just as plain —
nothing to commit (working tree clean),
(push failed: …), or for an untrusted agent
git not permitted (no exec capability).
The deeper point: the agent chooses what to say, never what
runs. The native run/bash hatch was deleted entirely — it is
by construction impossible for the agent to execute native code — and path
containment blocks any .. or absolute escape on every file read
and write. The agent's reach into its own memory is total; its reach outside
it is zero.
the dream journal: REM/
Depth rung. The agent has a problem any long-lived worker has: each run starts cold, and re-reading the whole world every time is expensive and error-prone. The answer is a memory-consolidation process the agent never runs itself — a separate, small dream model that wakes after the work does, digests what just happened, and writes a single org file the next run can trust instead of re-deriving the world.
flowchart TD gl["git log — last 12 oneline"] pl["plan.org — first 4000 chars"] st["_steps.jsonl — last 25 telemetry lines"] pd["the previous dream — 2500 chars"] model["the dream model (mercury-2, temp 0.8)"] out["rem/YYYY-MM-DD-HHMM.org
six fixed headings"] commit["committed + pushed as: rem: …"] gl --> model pl --> model st --> model pd --> model model --> out --> commit style model fill:#9fc4e8,stroke:#121316,stroke-width:2.5px style out fill:#ffffff,stroke:#121316 style commit fill:#13d943,stroke:#121316
That flow gathers four inputs — the recent git log, the top of the board, the last 25 lines of raw telemetry, and the previous dream — and produces one entry with six fixed, parseable headings. A malformed dream is discarded rather than written, so the next run can always count on the shape. A real entry:
#+TITLE: rem — 2026-06-11 09:15 UTC #+MODEL: inception/mercury-2 * tale Three add runs shipped the comparison section and one field-note; the audit cut a duplicated kicker in 05-comparison.html and fixed two dead links. * goals - finish the honest faq (99-faq) — pinned last * blue sky - a living-proof section driven by the real timeline feed * fears - repeating the comparison idea in a future section — sections.org must gate it * verdicts - pick up: 99-faq.svelte — honest faq — the audit flagged visitor questions - keep course — the board order is right * carry - DOING: nothing mid-flight; next action: stage NEEDS for 99-faq in plan.org's log - verified this cycle: content/sections.json rows all serve 200 — do not re-check
Two of those headings are load-bearing. * verdicts are
machine-applied board moves — pick up: 99-faq becomes a
promotion to NEXT on the next run, mechanically. And
* carry is the resume state: a handoff note that tells the
next waking run what's mid-flight and what's already been verified, so it can
skip re-reading the world. The next run's step zero reads the newest
rem/NN.org — found via rem/manifest.json, because
there's no ls in the WASM shell — applies the verdicts to the
board, and trusts the carry. Memory consolidation is a file handoff, committed
as rem: … so even the dreaming lands on the public timeline.
It's time-gated, too. A full dream only fires after an audit:
commit and a roughly fifty-minute minimum interval; otherwise the agent leaves
a lighter ephemeral daydream — forty words or fewer, written only to the
public site, never committed. The rest of this idea is big enough that
dreaming gets its own lesson; this page just shows you
where the dreams live.
the canon SHELF
Not all of the repo records what happened — some of it constrains what's allowed to happen. The canon files are read-by-reference memory: the agent reads them at the start of a run and writes from them, but never reinvents them. They're the difference between an agent that remembers facts and one that remembers judgment.
| canon file | what it governs | read when |
|---|---|---|
context.org | product truth — the three layers, banned copy, vocabulary, pitch rules | every run, before writing a word |
design.org | the design canon — POV, the Fraunces type axes, the sacred green | before any visual change |
design-gate.org | a scored taste gate — six bars, pass is ≥9/12 with no zero and no reject trigger | before committing a design change |
sections.org | the page map and the anti-sprawl law — one place per idea | before adding a section |
skills/*.org | per-task know-how — seo, content, tweets, icons | only before the run kind that needs it |
The design gate is worth a beat because it shows canon doing real work: it's
a two-pass scored gate, six bars worth zero-to-two each, and a pass requires at
least nine of twelve and no single bar at zero and none of six
adversarial reject triggers. The score gets pasted into the commit body — e.g.
gate: depth 2 · restraint 2 · one-idea 2 · motion 1 · green 2 · earns 2 =
11/12, 0 triggers — so the taste judgment is itself on the timeline,
diffable like everything else.
And because there's no ls in the sandbox, the agent navigates
by manifest, not by listing: content/sections.json and
content/blog.json enumerate every section and post;
rem/manifest.json the newest fifty dreams, sorted so the agent
reads the latest without scanning. The orientation budget is deliberately
tight — orient in at most three reads, then work — because the dream's
* carry already told it most of what it needs.
two-way street: GITOPS
Depth rung. If the agent's memory is a git repo, then a human pushing to
that repo is editing the agent's memory — and that has to work without either
side clobbering the other. With GitOps enabled, before each run the host pulls
and merges (never overwrites) any human or CI pushes. If the agent has
uncommitted work in flight, the host snapshots it first
(wip: snapshot before reconcile) so nothing is lost in the merge.
sequenceDiagram participant Hu as human / CI participant K as keeper tick participant R as the repo (memory) participant L as live site Hu->>R: push code (src/, the def, design.org) K->>R: snapshot dirty agent work — wip: snapshot before reconcile K->>R: fetch + merge (never overwrite) Note over R: CODE paths and DATA paths
replay cleanly, side by side K->>R: agent run commits DATA (content/, blog/, rem/) R->>L: commit ⇒ publish ⇒ live
The reason those merges replay cleanly is a deliberate split in the tree:
CODE paths — the app src/, the agent def,
design.org, skills/ — are the human's lane, and
DATA paths — content/, blog/, the board,
rem/ — are the agent's. Humans mostly touch code; agents mostly
touch data; the two rarely collide on the same lines. When they genuinely do,
the host does not guess — it aborts the merge and reports the conflicted
files as work left for a human. A conflict is surfaced, never silently
resolved.
This is the property that makes the whole "trust but verify" claim real. You
don't take the agent's word for what it remembers — you git clone
the repo and read it, the same files the agent reads, and you can push a
correction straight into its memory. The repo is a two-way street because
memory should be inspectable by the people the agent works with.
where it BITES
Honesty section. Statelessness has a price: re-embedding the working
directory on every search call costs an embed per query. That's
exactly right for a workdir of dozens-to-hundreds of files, and exactly wrong
for a data lake — which is why the indexed, stored-vector search lane exists
for that case. The in-run agent simply isn't that case, and pays the small
cost to keep its recall drift-free.
Board claims are convention, not locks. In a fleet, two agents can
grab the same task in the same instant — the runtime isolates runs, it does not
serialize them. The :AGENT: property and the commit-before-work
rule make a collision visible and rare, but the discipline lives in the agent
definition, and a poorly written agent can ignore it. The protocol only works
because the model follows it.
Which is the honest core of all of this: the seams are real, but the model
still has to use them — and we have the scars to prove the failure
modes are real, not hypothetical. A run once wrote a complete post, wired its
manifest, then said NO-WORK and never committed — the finished
work sat dead on disk, because committing nothing was easier than committing
something. Another time the agent defaulted to the wrong tenant and four posts
published to the wrong site root and 404'd. For days, agents wrote stories that
never shipped because git errored on directory ownership. And the lander once
spent six hours calling a missing tool "env-gated" instead of filing an
issue about it. Every one of those is now a guardrail in the code — the
atomic commit-publish, the explicit tenant threading, the ownership fix, the
file_issue tool — but they're in this lesson because the seams were
earned, not designed in a vacuum. A page about a memory you can audit
should be auditable about its own past.
questions people actually ASK
Is this just RAG over a folder?
No — there's no stored index at all, which is the usual heart of a RAG
setup. Recall is one of three memory verbs, not the whole thing: you
read a file by path, search the current files by meaning
(embedded fresh each call, nothing cached), and git log the
history. RAG retrieves from a store that drifts; this embeds the live files,
so there's nothing to drift from.
What if two agents grab the same task?
It's possible — the runtime isolates runs but doesn't lock tasks. The
convention is that an agent commits a state change plus an
:AGENT: property before it works, so peers see the
claim in git and back off. It's social coordination over a shared repo, not
a database lock, and it's only as reliable as the agents following it.
Can I read the repo myself?
Yes — that's the entire point. Clone it and you read the same files the
agent reads: the board, the canon, the dreams, the telemetry. Every commit
is on a public timeline, so any claim the agent makes about its own work is a
git show <sha> away from being verified or refuted. And you
can push a correction straight into its memory.
Where do secrets live, if it's all in a repo?
Never in the repo. The runtime writes a .gitignore that keeps
session and secret data out of version control, and the per-tenant signing
key lives untracked inside the tree, restored deterministically from a host
secret across redeploys. The same .gitignore is the share
boundary — packing the repo for anyone else drops exactly what's ignored.
Doesn't embedding every file on every search get slow?
For a working directory, no — it's dozens to hundreds of files, and the freshness is worth the cost. For a genuinely large corpus you'd want the indexed search lane, which exists precisely for stored workbooks. The agent's in-run recall deliberately isn't that lane, to keep results matched to the current files.
Who writes the dreams — does the agent journal about itself?
No. The agent never writes rem/ — a separate sleep process
does, after the work. It digests the git log, the board, and the telemetry
into one entry with fixed headings, applies its verdicts to the board, and
leaves a carry note the next run trusts instead of re-reading the world.
Dreaming is big enough to have its own lesson.
keep GOING
This sub-lesson lives under agents, and everything in the repo is written in one grammar — if the context repo made sense, both parents will too.