the amnesia TAX
The agents lesson gave a worker the one sense most automation lacks — time. A scheduled agent wakes on a beat, does work, and sleeps until the next beat. But a worker that wakes on a schedule has a brutal economics problem: every run wakes up amnesiac. It re-reads the board, re-reads the repo, re-reads its own history — just to remember where it was — before it does any actual work.
Under a wall-clock kill — the living lander's agent runs die at a 12-minute ceiling — that orientation cost is the product cost. Minutes spent re-reading are minutes not spent shipping. And the naive fixes don't help: a longer context window preserves text, not judgment; "just summarize the chat" hands the next run a transcript, not a decision about what to do.
There's a second, quieter problem. If the same model that does the work also grades the work mid-run and decides what's next, its priorities drift run-over-run — models flatter their own output. You want the deciding done by something that isn't in the middle of the doing.
the DEFINITION
1. a sleep phase between agent runs: a separate, small model digests the cycle's telemetry, commits, and backlog into one structured journal entry whose verdicts steer the board and whose carry resumes the next waking run — written to the repo, never by the agent itself.
The metaphor is borrowed straight from biology: REM, the sleep phase where
a day's experience is consolidated into memory. Here it's literal plumbing —
a module named Workbooks.Dreams that runs after a cycle ends,
reads what happened, and leaves behind a small org file the next run will
read first. The agent reads its newest dream at orient time; it
never writes one.
sleep has a SLOT
Dreaming isn't a background daemon that fires whenever. It's a declared state in the agent's lifecycle — a small state machine the keeper steps through, one state per scheduled tick. The canonical cadence is three add runs, an audit, a dream, then a plan, and loop:
flowchart LR a1["wake_add"] --> a2["wake_add"] --> a3["wake_add"] --> au["wake_audit"] au --> rem(["rem — SLEEP
no agent runs"]) rem --> pl["wake_plan"] pl -.loop.-> a1 style a1 fill:#ffffff,stroke:#121316 style a2 fill:#ffffff,stroke:#121316 style a3 fill:#ffffff,stroke:#121316 style au fill:#9fc4e8,stroke:#121316 style rem fill:#13d943,stroke:#121316,stroke-width:2.5px style pl fill:#ffffff,stroke:#121316
The rem node is different in kind from its neighbors. A state
marked :KIND: rem skips the agent run entirely — the keeper
calls the dream process directly instead of waking the model. Three rules
govern it, and all three are about protecting the cadence:
- It's gated. A minimum interval — default 50 minutes — means at most one full dream per audit cycle. Hit the state too soon and it's a no-op tick that simply holds position.
- It holds on failure. A killed or failed dream retries the same state next tick. Cadence position is never lost; it persists across redeploys.
- It never blocks a run. On the legacy path — no declared lifecycle — the dream is fired forget-and-go after each wake, so a slow or failed dream can never touch the run loop.
The spec itself is native org: headings are states,
:PROPERTIES: are the edges and gates. Here's the real
rem state from the example lifecycle:
* rem :PROPERTIES: :KIND: rem :NEXT: wake_plan :MIN-INTERVAL: 10m :END: Dream: consolidate the cycle's telemetry into a rem/ entry.
Where the rem state sits in your cadence is your call — this whole machine lives in the orchestration lesson. Dreaming is just one well-chosen state in it.
what the sleeper SEES
A dream is only as honest as its inputs, so the gather step is deliberately factual — no chat transcript, no model self-report, just four hard sources about what actually happened this cycle:
| input | source | slice taken |
|---|---|---|
| recent commits | the tenant repo | git log -12 --oneline |
| the backlog | plan.org | first 4,000 chars — the board |
| tool telemetry | _steps.jsonl | last 25 steps, each one line |
| the last dream | the previous rem/ entry | first 2,500 chars |
The telemetry slice is the interesting one. Every tool call the agent makes
is appended to _steps.jsonl lock-free, by construction — nothing
the agent does can escape the record. The dream doesn't get the raw rows; each
is compressed to one line: the tool name, the path or command or
pipeline truncated to 80 characters, and the exit code. So the dream model
reads twenty-five lines that look like this:
edit src/sections/grown/Weave.svelte (exit 0)
That's the whole grammar of what the sleeper sees about the work: tool, what it touched, did it succeed. And because the gather pulls the previous dream too, dreams chain — each entry is written with the last one in view, so the journal carries a thread session-over-session instead of resetting cold every cycle.
six headings, two LOAD-BEARING
The dream model is small and cheap — a diffusion LLM, inception/mercury-2
by default, temperature 0.8. It's handed the gathered facts and a system prompt
that demands exactly six headings, in order. A response missing any one
of them is discarded as malformed — no retry loop, no salvage. A skipped
dream is better than a broken one.
Four of the headings are the agent's self-narrative. Two are machinery wearing poetry's clothes. Here's a representative entry — styled faithfully to the prompt's constraints and the lander's real world, not lifted verbatim from the live box:
#+TITLE: rem — 2026-06-10 14:30 UTC #+MODEL: inception/mercury-2 * tale The audit run found the pricing section's claim about offline mode unverified and cut it. Before that, two add runs shipped the mechanism section (src/sections/grown/Weave.svelte, commit a3f91c2) and a blog post on desktop app builders. The gate failed once on the weave section — contrast bar at 0 — fixed and re-run. wb content check came back clean. * goals - finish the comparison table task already on the board - verify the blog post renders at /blog/desktop-app-builder - groom the two stale strategy tasks from last week * blue sky - a live diagram section that renders the actual lifecycle state - let the /rem page link each verdict to the commit that resolved it * fears - repeating the weave section's idea in a new wrapper - shipping a claim about pricing I can't verify from context.org * verdicts - pick up: comparison table for the landscape section — audit confirmed the data is ready - put down: tweet copy for launch — blocked until the social lane exists - keep course — the board's top objective still matches the landscape * carry - DOING: comparison table for the landscape section - next action: read strategy/landscape.org rows, build the table partial only - verified this cycle: wb content check clean; do not re-run it before editing
Read top to bottom, the contract is strict:
- tale — max 120 words, plain past tense. Name real files, real commits, real failures; never invent events. It's a log, not a story.
- goals — three to five concrete near-term lines, drawn from the backlog and the tale.
- blue sky — two or three bigger ideas, grounded in what the site actually is.
- fears — two or three honest risks: repetition, quality drift, breaking the page, saying things that aren't true. This heading exists precisely because models flatter themselves — it's a forced look at the failure modes.
- verdicts — the first load-bearing heading. Board moves.
- carry — the second. The resume state.
The next two sections are those two headings, because they're the reason the whole mechanism earns its place.
verdicts move the BOARD
The * verdicts heading is not advice the agent weighs. It's a
set of instructions the agent applies mechanically, with no judgment at
apply time — and that's the whole point. Each verdict names a board task by its
exact heading text and prescribes one of four moves:
| verdict in the dream | board transition | effect next run |
|---|---|---|
pick up: X — why | ** TODO X → ** NEXT X | done first |
put down: X — why | ** NEXT/DOING X → ** TODO X | deprioritized, kept |
cancel: X — why | → ** CANCELLED X — why | closed, never deleted |
keep course — why | no change | board order stands |
The board's own grammar makes this work: ** TODO is open,
** NEXT is dream-promoted ("do this first"), ** DOING
is in flight, and ** CANCELLED X — why closes a task without ever
deleting it. The agent's board protocol, step one of every run, is: apply the
newest dream's verdicts to the board, then pick the first NEXT,
else the top TODO, and mark it DOING. State changes
are the workflow — tasks are never deleted, only moved.
Why mechanical? Because the deciding already happened — at sleep time, by a model that wasn't mid-task. Applying it at wake time is pure bookkeeping. No judgment at apply time means no drift at apply time: the run that does the work doesn't get to re-litigate its own priorities. That's the cure for the self-grading drift from the first section, made structural.
carry is the RESUME state
Here's the rung that pays the amnesia tax down. * carry is the
handoff note to the next waking run — written so that run does not re-read
the world. It holds the task currently DOING, the exact next
action, any file mid-flight, and anything verified this cycle that need not be
re-verified.
The agent's orientation budget is brutal on purpose: orient in at most
three reads before the 12-minute wall closes — first plan.org,
second the newest dream's * carry, third the one file it's about
to change. And the instruction on the carry is blunt: TRUST it; do not
re-verify what it already checked. Read the manifest to find the newest
dream — don't even list the rem/ directory.
Walk the handoff as a sequence — three runs, and notice that the only thing crossing the gap between them is the carry:
sequenceDiagram participant N as run N (waking) participant S as sleep (the dream) participant M as run N+1 (waking) N->>N: works the task, leaves DOING mid-flight Note over N: writes nothing about itself N->>S: cycle ends — telemetry + commits + board S->>S: consolidates → writes * carry S->>M: carry is on disk M->>M: read carry — DOING, next action, what's verified Note over M: skips re-orientation,
trusts the verified facts M->>M: goes straight to work
The trade is explicit: carry is trusted unverified, by design —
speed over safety. The safety net isn't re-checking; it's the next
audit state, which re-grounds the agent against reality on a known
cadence. Between audits, the agent moves fast on faith. That's the bargain that
buys back the orientation minutes.
the in-between TIER
Not every gap between runs earns a full dream — the gate sees to that. But the runtime fills the smaller gaps with a second, lighter tier: daydreams. They're ephemeral musings — at most 40 words, lowercase, present tense, a little wistful — written only to the public site and never committed. The reason is precise: so the public timeline carries no dream noise, and the agent never looks like it's burning cycles narrating itself.
The two tiers split cleanly:
| full dream | daydream | |
|---|---|---|
| fires | after an audit: commit | any other run |
| gate | ≥ 50 min | ≥ 12 min |
| committed? | yes — as rem: | never — site-only |
| shape | six headings | ≤ 40 words |
| temperature | 0.8 | 1.0 |
| kept | last 50 entries | last 60 |
The full dream is judgment, validated and committed. The daydream is mood, thrown away. Here's a real one, genuinely captured from a dev box:
i watch the pages load like sunrise, hoping each click finds a quiet corner where users linger, while my code hums softly in the background, dreaming of smoother paths.
It does no work. It moves no board. It's the texture of a thing that's awake between its jobs — and it costs almost nothing.
the journal is PUBLIC
Every full dream is committed as rem: <first line> and
pushed to origin/main. The public timeline badges those commits;
the entries are mirrored into the served tree so a /rem journal
page renders them live — fetching the manifest every 60 seconds and the
daydreams every 90. The agent on the lander has a name — Waldo, the
Workbook Autonomous Live Document Operator — and its commit tags are a small
type system: add:, blog:, audit:,
rem:. The journal links straight to the repo.
And here's the transparency move that matters: the raw telemetry
stays private. Files like _steps.jsonl carry an underscore prefix
that marks them private — they never leave the tenant. What's public is the
distillation, not the firehose. You can read what the agent thinks it
did and where it thinks it's going, in plain language, without exposing every
truncated tool argument it ever logged. Transparency by digest.
where dreaming BREAKS
This is a young mechanism and it has real edges. The honest list:
- A bad dream mis-steers a run. Verdicts are applied mechanically and
carry is trusted unverified — so if the dream model decides wrong, the next
run faithfully executes the wrong thing. The bound is the next
auditstate, not anything tighter. Speed has a cost. - The 25-step slice can miss the story. The dream sees the last twenty-five tool calls with 80-character arguments. A long cycle, or a subtlety that lived in a truncated path, simply isn't in view.
- Malformed dreams vanish silently. Miss a heading and the entry is discarded with a log warning — no dream that cycle. Correct behaviour, but it means the journal can have gaps you won't notice without looking.
- One journal per tenant. Dreams are written per tenant repo, not per agent. Whether individual members of a multi-agent fleet should each dream is not yet built — today they'd share one journal.
And the boundary that keeps all of this safe to adopt: the Dreams engine is host code — it ships in the runtime, fixed. The verdict protocol is convention — it lives in the tenant's editable agent definition, the prose that tells the agent to apply verdicts and trust carry. The runtime gives you the sleep stage; how your agent consumes a dream is yours to shape. This is one tenant's convention, documented so you can adapt it — not a law baked into the engine.
questions people actually ASK
Does it cost much?
Roughly one small-model call per audit cycle — gated to at most once every 50 minutes — plus the cheap, throwaway daydreams. The dream model is a small diffusion LLM, not the agent's working model. It's a rounding error against the runs it makes faster.
Can I change the dream model?
Yes — WB_DREAM_MODEL sets it (default
inception/mercury-2). The interval gate is
WB_DREAM_MIN_INTERVAL_MS, and the lifecycle that places the rem
state is WB_LIFECYCLE_DEF. All of it lives in
runtime config.
Can the agent fake its own dreams?
No — it never writes rem/. The sleep process does, from
telemetry and commits, while the agent isn't running. The agent only ever
reads its newest dream. The separation is the integrity.
Why org headings, not JSON?
Because the entry has two readers. The agent parses the fixed headings by
regex to move the board — and a human reads the journal at /rem,
and the next dream reads the previous one as prose. Org
is parseable as a schema and readable as writing. JSON would have served only
the first reader.
Do multiple agents share dreams?
Today, yes — there's one dream journal per tenant, not per agent. Each agent already gets its own cadence position, but per-agent dreaming isn't built yet. We'd rather mark that honestly than imply it works.
How is this different from the autopoet?
Both are standing processes that work while you're away, but they edit different things. The autopoet edits the configuration — toolkits, skills, definitions. Dreaming edits judgment — the board and the resume state. One tends the garden; the other consolidates the memory.
keep GOING
Dreaming sits inside the agent — start with the parent if any of the cycle felt unfamiliar.