learn / 05·3 — under agents · context repo

the workingDIRECTORYis the memory

Every agent memory you've met is a bolt-on — a vector store with its own schema, drifting from reality, dying with the vendor. Here there is no second system. An agent's memory is its working directory, and that directory is a git repo of plain org files. Remembering is a write. Recalling embeds the current files. What did you do is git log. And every commit is the publish step — committed always means live.

context repo12 min read
A lone figure standing inside a vast cylindrical archive-tower whose walls are living shelves of glowing green ledger-pages that rewrite themselves, a single bright commit-line threading floor to ceiling — monumental, luminous, 1970s sci-fi style

memory is always a SECOND system

Every agent memory you've seen is a bolt-on. A vector database the agent writes embeddings into. A "memory" API with its own schema and its own retention rules. A store that slowly drifts away from what's actually true, that nobody can open and read, and that dies the day the vendor does. The shape is always the same: the work lives in one place and the memory of the work lives in another, and nothing structural connects them.

The cost shows up the moment you want to check the agent. When it forgets, misremembers, or claims it did something it didn't, there's no artifact to diff against — the memory is a black box with an embedding index inside it. You can ask it what it remembers; you can't audit what it remembers. The most context-hungry collaborators we've ever had arrived, and the industry's answer was to give them a memory nobody else can read.

This page is one answer to that, and it's almost embarrassingly plain: delete the second system entirely. Let the agent's memory be the one thing every developer already knows how to read, diff, and revert — a directory of files under version control.

the DEFINITION

con·text re·po /ˈkɑn·tɛkst ˈri·poʊ/ noun

1. an agent's working directory that is also a git repository of plain files — where the files are the memory and the commits are the changelog. The agent remembers by writing a file, recalls by embedding the files it has now, and reports what it did with git log. No separate store.

The agent's own search tool says it in the runtime, verbatim: recall by meaning — semantic search over the org/code files in your working context; no separate memory store; the files ARE the memory. To remember something, write it as an org file — it becomes searchable automatically. That isn't marketing copy. It's the tool description the model reads at runtime, and the rest of this lesson is the machinery that makes it true.

one tree, five WRITERS

The repo is one directory — WB_DATA/<tenant>, a real git repo the runtime git inits the first time an agent runs there. But five different writers contribute to it, each owning its own paths, and the whole design rests on keeping those territories clear. Who writes what:

pathwritten byread bycommitted?
plan.org · content/ · blog/the agenteveryone — peers, humans, its own next runyes
src/ shell · the agent def · design.org · skills/the team (humans + CI)the agent, as read-only canonyes
rem/*.org · rem/manifest.jsonthe dream process (a separate sleep model)the agent's next run, at step 0yes
_steps.jsonl · /events.orgthe runtime (automatically, every tool call)the dream process; humans auditingyes
.gitignore · .workbooks/<tenant>.ed25519 · session datathe runtimethe host onlynever (gitignored)

Two facts in that table do a lot of work. First, the agent's territory is prose-law in its own definitionyour territory: src/sections/grown/, the content/** partials, blog/, plan.org, strategy/ — nothing else. The boundary isn't enforced by the runtime; it's written into the agent and visible in the diff when it's crossed. Second, the bottom row never enters version control: the per-tenant Ed25519 keypair that backs the agent's did:key identity lives inside the same tree but stays untracked. The .gitignore the runtime writes is doing double duty — it's the privacy boundary and the share boundary, because packing the repo for anyone else drops exactly the parts git check-ignore flags. The agent's native git instinct is the security model.

And the def itself — the thing that decides who the agent is — is just another file in this tree, written in the same org grammar as everything else: an :agent: node with :MODEL: and :TOOLKITS: properties and a ** System prompt heading. No bespoke format. The lander's def declares :MODEL: anthropic/claude-opus-4.8, :TOOLKITS: git sandbox — and states the whole thesis in one line: no database, no CMS framework: files in this repo ARE the CMS.

the board: PLAN.ORG

The agent doesn't keep its plan in a context window that vanishes when the run ends. It keeps it in a file, plan.org, in the same repo as everything else — and that file is a workflow in exactly the sense the parent lesson means. Here is a real board, the actual genesis file the living-lander agent started from:

#+TODO: TODO NEXT DOING | DONE CANCELLED

* board
** TODO objective: know the landscape
*** NEXT strategy: initial landscape report — AI app builders cohort (Lovable, v0, Cursor, Bolt, Replit)
*** TODO strategy: internal-tools cohort (hex.tech, Retool, Airtable, PowerApps)
** TODO objective: page completeness
*** TODO 99-faq.svelte — honest faq (faqs live last; source-only commit)

* log
- 2026-06-10 (team): genesis — fresh history, Svelte source rail, roadmap seeded

The first line declares the state set, and it isn't decoration: the runtime's workflow engine parses exactly these keywords — TODO NEXT WAITING DOING STARTED BLOCKED as live states, DONE CANCELLED CANCELED as done. The agent doesn't invent a status model; it writes into one the engine already understands.

One task's whole life is a sequence of state changes in place. On a run, the agent applies the newest dream's verdicts mechanically (more on that below), picks the first NEXT task — or the top TODO if none — marks it DOING, does the work, and marks it DONE in the same commit that ships the work:

flowchart LR
  todo["TODO — backlog"]
  next["NEXT — promoted by a dream verdict"]
  doing["DOING — claimed this run"]
  done["DONE — shipped, same commit"]
  cancel["CANCELLED — dropped"]
  todo --> next --> doing --> done
  todo -. "pick up: …" .-> next
  todo -. "cancel" .-> cancel
  next -. "cancel" .-> cancel
  style done fill:#13d943,stroke:#121316,stroke-width:2.5px
  style doing fill:#9fc4e8,stroke:#121316
  style cancel fill:#d9dbd3,stroke:#121316
  style todo fill:#ffffff,stroke:#121316
  style next fill:#ffffff,stroke:#121316
  

So the genesis line *** NEXT strategy: initial landscape report… becomes *** DOING strategy: initial landscape report… when the agent claims it, then *** DONE strategy: initial landscape report… in the commit that ships the report — and the * log gains one dated dash. The law in the def is blunt: state changes ARE the workflow; never delete tasks, move their state. A deleted task leaves no trace; a moved state is a diff. The whole board is therefore replayable from git log — you can watch the plan think.

In a fleet, claiming is a convention, not a lock. An agent grabbing a task writes the state change plus an :AGENT: property and commits it before doing the work, so peers see the claim in git. The runtime never locks a task — it only isolates runs. The coordination is social, and the medium is the same shared repo.

recall without a STORE

Depth rung. If the files are the memory, how does the agent find the right one among hundreds? Not with a stored index. The search tool is stateless: on every single call it globs the working directory, chunks the current files, embeds them, ranks by cosine similarity, and returns the top five. There is no index sitting on disk to fall out of date — which is the entire point. You search the actual files the agent is working in right now, so results always reflect the current truth. There is no second note store to drift, because there is no note store at all.

flowchart LR
  q["query — recall by meaning"]
  glob["glob the workdir
(.org .md .ex .svelte .js …)"] chunk{"chunk by kind"} org["org files: one chunk
per heading section"] txt["other text: 20-line windows"] embed["embed query + every chunk
(no stored vectors)"] rank["cosine rank → top 5
path · headline · 240-char snippet"] q --> embed glob --> chunk chunk --> org --> embed chunk --> txt --> embed embed --> rank style q fill:#9fc4e8,stroke:#121316,stroke-width:2.5px style rank fill:#13d943,stroke:#121316 style embed fill:#ffffff,stroke:#121316

The chunking is org-aware, and that's not incidental. An .org file chunks into one chunk per heading section — the headline is extracted, the tags stripped — so a search hit lands on a whole coherent thought, not a window that happens to straddle two ideas. Every other text file chunks into plain 20-line windows. Either way, embedding happens at query time over what's there now.

This is why "to remember, write an org file" is literally true. There's no separate "save to memory" verb. You vfs_write a section into an org file and it is, by that act, searchable — it joins the corpus the next query embeds. Memory has exactly three verbs and you already know all of them: read a file, search the files by meaning, and git log the history. There is a heavier, indexed search lane in the runtime — hybrid vector-plus-literal fusion over a stored index — but it's deliberately reserved for stored workbooks, not the agent's in-run recall, precisely to avoid the drift a stored index introduces.

commit means LIVE

The agent never runs git. Its entire git capability is a single host-brokered tool, granted only when the agent carries the exec: true trust flag — and even then, the agent supplies one thing only: the commit message. Here is the whole call the model makes:

{"name": "git", "arguments": {
   "message": "add: comparison section — a page that dies vs a page that lives"
}}

The host owns the command line. On that one call it runs, in order: ensure the .gitignore, git add -A, commit with hooks disabled and the tenant's identity (<tenant>@workbooks.local), publish content/** and blog/** to the live site root, then git push origin HEAD. The tool output the model sees is one line:

committed + pushed: 3fa9c12 (pushed) (published 2)
sequenceDiagram
  participant A as agent (picks the message)
  participant H as host (picks the command line)
  participant G as git
  participant S as SitePublish
  participant L as live site + public timeline
  A->>H: git { message }
  H->>G: add -A · commit · push origin HEAD
  H->>S: publish content/** + blog/** → live root
  S->>L: mirror to the served tree
  G->>L: sha lands on the public changelog
  H-->>A: 3fa9c12 (pushed) (published 2)
  

That walkthrough's spine is the atomicity. Commit implies publish, in the same call — and that's not tidiness, it's a scar. The lander once shipped blogs that were committed but returned 404, because the run died before a separate publish step ever ran. So the seam was redrawn: there is no separate publish step anymore. If it's committed, it's live. The failure shapes the model might instead see are just as plain — nothing to commit (working tree clean), (push failed: …), or for an untrusted agent git not permitted (no exec capability).

The deeper point: the agent chooses what to say, never what runs. The native run/bash hatch was deleted entirely — it is by construction impossible for the agent to execute native code — and path containment blocks any .. or absolute escape on every file read and write. The agent's reach into its own memory is total; its reach outside it is zero.

the dream journal: REM/

Depth rung. The agent has a problem any long-lived worker has: each run starts cold, and re-reading the whole world every time is expensive and error-prone. The answer is a memory-consolidation process the agent never runs itself — a separate, small dream model that wakes after the work does, digests what just happened, and writes a single org file the next run can trust instead of re-deriving the world.

flowchart TD
  gl["git log — last 12 oneline"]
  pl["plan.org — first 4000 chars"]
  st["_steps.jsonl — last 25 telemetry lines"]
  pd["the previous dream — 2500 chars"]
  model["the dream model (mercury-2, temp 0.8)"]
  out["rem/YYYY-MM-DD-HHMM.org
six fixed headings"] commit["committed + pushed as: rem: …"] gl --> model pl --> model st --> model pd --> model model --> out --> commit style model fill:#9fc4e8,stroke:#121316,stroke-width:2.5px style out fill:#ffffff,stroke:#121316 style commit fill:#13d943,stroke:#121316

That flow gathers four inputs — the recent git log, the top of the board, the last 25 lines of raw telemetry, and the previous dream — and produces one entry with six fixed, parseable headings. A malformed dream is discarded rather than written, so the next run can always count on the shape. A real entry:

#+TITLE: rem — 2026-06-11 09:15 UTC
#+MODEL: inception/mercury-2

* tale
Three add runs shipped the comparison section and one field-note; the audit
cut a duplicated kicker in 05-comparison.html and fixed two dead links.
* goals
- finish the honest faq (99-faq) — pinned last
* blue sky
- a living-proof section driven by the real timeline feed
* fears
- repeating the comparison idea in a future section — sections.org must gate it
* verdicts
- pick up: 99-faq.svelte — honest faq — the audit flagged visitor questions
- keep course — the board order is right
* carry
- DOING: nothing mid-flight; next action: stage NEEDS for 99-faq in plan.org's log
- verified this cycle: content/sections.json rows all serve 200 — do not re-check

Two of those headings are load-bearing. * verdicts are machine-applied board movespick up: 99-faq becomes a promotion to NEXT on the next run, mechanically. And * carry is the resume state: a handoff note that tells the next waking run what's mid-flight and what's already been verified, so it can skip re-reading the world. The next run's step zero reads the newest rem/NN.org — found via rem/manifest.json, because there's no ls in the WASM shell — applies the verdicts to the board, and trusts the carry. Memory consolidation is a file handoff, committed as rem: … so even the dreaming lands on the public timeline.

It's time-gated, too. A full dream only fires after an audit: commit and a roughly fifty-minute minimum interval; otherwise the agent leaves a lighter ephemeral daydream — forty words or fewer, written only to the public site, never committed. The rest of this idea is big enough that dreaming gets its own lesson; this page just shows you where the dreams live.

the canon SHELF

Not all of the repo records what happened — some of it constrains what's allowed to happen. The canon files are read-by-reference memory: the agent reads them at the start of a run and writes from them, but never reinvents them. They're the difference between an agent that remembers facts and one that remembers judgment.

canon filewhat it governsread when
context.orgproduct truth — the three layers, banned copy, vocabulary, pitch rulesevery run, before writing a word
design.orgthe design canon — POV, the Fraunces type axes, the sacred greenbefore any visual change
design-gate.orga scored taste gate — six bars, pass is ≥9/12 with no zero and no reject triggerbefore committing a design change
sections.orgthe page map and the anti-sprawl law — one place per ideabefore adding a section
skills/*.orgper-task know-how — seo, content, tweets, iconsonly before the run kind that needs it

The design gate is worth a beat because it shows canon doing real work: it's a two-pass scored gate, six bars worth zero-to-two each, and a pass requires at least nine of twelve and no single bar at zero and none of six adversarial reject triggers. The score gets pasted into the commit body — e.g. gate: depth 2 · restraint 2 · one-idea 2 · motion 1 · green 2 · earns 2 = 11/12, 0 triggers — so the taste judgment is itself on the timeline, diffable like everything else.

And because there's no ls in the sandbox, the agent navigates by manifest, not by listing: content/sections.json and content/blog.json enumerate every section and post; rem/manifest.json the newest fifty dreams, sorted so the agent reads the latest without scanning. The orientation budget is deliberately tight — orient in at most three reads, then work — because the dream's * carry already told it most of what it needs.

two-way street: GITOPS

Depth rung. If the agent's memory is a git repo, then a human pushing to that repo is editing the agent's memory — and that has to work without either side clobbering the other. With GitOps enabled, before each run the host pulls and merges (never overwrites) any human or CI pushes. If the agent has uncommitted work in flight, the host snapshots it first (wip: snapshot before reconcile) so nothing is lost in the merge.

sequenceDiagram
  participant Hu as human / CI
  participant K as keeper tick
  participant R as the repo (memory)
  participant L as live site
  Hu->>R: push code (src/, the def, design.org)
  K->>R: snapshot dirty agent work — wip: snapshot before reconcile
  K->>R: fetch + merge (never overwrite)
  Note over R: CODE paths and DATA paths
replay cleanly, side by side K->>R: agent run commits DATA (content/, blog/, rem/) R->>L: commit ⇒ publish ⇒ live

The reason those merges replay cleanly is a deliberate split in the tree: CODE paths — the app src/, the agent def, design.org, skills/ — are the human's lane, and DATA pathscontent/, blog/, the board, rem/ — are the agent's. Humans mostly touch code; agents mostly touch data; the two rarely collide on the same lines. When they genuinely do, the host does not guess — it aborts the merge and reports the conflicted files as work left for a human. A conflict is surfaced, never silently resolved.

This is the property that makes the whole "trust but verify" claim real. You don't take the agent's word for what it remembers — you git clone the repo and read it, the same files the agent reads, and you can push a correction straight into its memory. The repo is a two-way street because memory should be inspectable by the people the agent works with.

where it BITES

Honesty section. Statelessness has a price: re-embedding the working directory on every search call costs an embed per query. That's exactly right for a workdir of dozens-to-hundreds of files, and exactly wrong for a data lake — which is why the indexed, stored-vector search lane exists for that case. The in-run agent simply isn't that case, and pays the small cost to keep its recall drift-free.

Board claims are convention, not locks. In a fleet, two agents can grab the same task in the same instant — the runtime isolates runs, it does not serialize them. The :AGENT: property and the commit-before-work rule make a collision visible and rare, but the discipline lives in the agent definition, and a poorly written agent can ignore it. The protocol only works because the model follows it.

Which is the honest core of all of this: the seams are real, but the model still has to use them — and we have the scars to prove the failure modes are real, not hypothetical. A run once wrote a complete post, wired its manifest, then said NO-WORK and never committed — the finished work sat dead on disk, because committing nothing was easier than committing something. Another time the agent defaulted to the wrong tenant and four posts published to the wrong site root and 404'd. For days, agents wrote stories that never shipped because git errored on directory ownership. And the lander once spent six hours calling a missing tool "env-gated" instead of filing an issue about it. Every one of those is now a guardrail in the code — the atomic commit-publish, the explicit tenant threading, the ownership fix, the file_issue tool — but they're in this lesson because the seams were earned, not designed in a vacuum. A page about a memory you can audit should be auditable about its own past.

questions people actually ASK

Is this just RAG over a folder?

No — there's no stored index at all, which is the usual heart of a RAG setup. Recall is one of three memory verbs, not the whole thing: you read a file by path, search the current files by meaning (embedded fresh each call, nothing cached), and git log the history. RAG retrieves from a store that drifts; this embeds the live files, so there's nothing to drift from.

What if two agents grab the same task?

It's possible — the runtime isolates runs but doesn't lock tasks. The convention is that an agent commits a state change plus an :AGENT: property before it works, so peers see the claim in git and back off. It's social coordination over a shared repo, not a database lock, and it's only as reliable as the agents following it.

Can I read the repo myself?

Yes — that's the entire point. Clone it and you read the same files the agent reads: the board, the canon, the dreams, the telemetry. Every commit is on a public timeline, so any claim the agent makes about its own work is a git show <sha> away from being verified or refuted. And you can push a correction straight into its memory.

Where do secrets live, if it's all in a repo?

Never in the repo. The runtime writes a .gitignore that keeps session and secret data out of version control, and the per-tenant signing key lives untracked inside the tree, restored deterministically from a host secret across redeploys. The same .gitignore is the share boundary — packing the repo for anyone else drops exactly what's ignored.

Doesn't embedding every file on every search get slow?

For a working directory, no — it's dozens to hundreds of files, and the freshness is worth the cost. For a genuinely large corpus you'd want the indexed search lane, which exists precisely for stored workbooks. The agent's in-run recall deliberately isn't that lane, to keep results matched to the current files.

Who writes the dreams — does the agent journal about itself?

No. The agent never writes rem/ — a separate sleep process does, after the work. It digests the git log, the board, and the telemetry into one entry with fixed headings, applies its verdicts to the board, and leaves a carry note the next run trusts instead of re-reading the world. Dreaming is big enough to have its own lesson.

keep GOING

This sub-lesson lives under agents, and everything in the repo is written in one grammar — if the context repo made sense, both parents will too.