learn / 05·8 — under agents · dreaming

a sleep phaseBETWEENthe runs

A scheduled agent wakes up amnesiac and burns its first minutes re-reading the world. Dreaming fixes that — a separate, cheaper model runs between runs, digesting what the tools did, what got committed, and what's on the board into one structured journal entry. Its verdicts move the board; its carry resumes the next run. The agent never writes it — the sleep process does.

dreaming11 min read
A small figure asleep at the foot of a monumental glowing memory-engine that distills a day of work into a single bright crystal tablet — warm blues and green, 1970s sci-fi paperback style, vast machine versus tiny dreamer

the amnesia TAX

The agents lesson gave a worker the one sense most automation lacks — time. A scheduled agent wakes on a beat, does work, and sleeps until the next beat. But a worker that wakes on a schedule has a brutal economics problem: every run wakes up amnesiac. It re-reads the board, re-reads the repo, re-reads its own history — just to remember where it was — before it does any actual work.

Under a wall-clock kill — the living lander's agent runs die at a 12-minute ceiling — that orientation cost is the product cost. Minutes spent re-reading are minutes not spent shipping. And the naive fixes don't help: a longer context window preserves text, not judgment; "just summarize the chat" hands the next run a transcript, not a decision about what to do.

There's a second, quieter problem. If the same model that does the work also grades the work mid-run and decides what's next, its priorities drift run-over-run — models flatter their own output. You want the deciding done by something that isn't in the middle of the doing.

the DEFINITION

dream·ing /ˈdriː·mɪŋ/ noun

1. a sleep phase between agent runs: a separate, small model digests the cycle's telemetry, commits, and backlog into one structured journal entry whose verdicts steer the board and whose carry resumes the next waking run — written to the repo, never by the agent itself.

The metaphor is borrowed straight from biology: REM, the sleep phase where a day's experience is consolidated into memory. Here it's literal plumbing — a module named Workbooks.Dreams that runs after a cycle ends, reads what happened, and leaves behind a small org file the next run will read first. The agent reads its newest dream at orient time; it never writes one.

sleep has a SLOT

Dreaming isn't a background daemon that fires whenever. It's a declared state in the agent's lifecycle — a small state machine the keeper steps through, one state per scheduled tick. The canonical cadence is three add runs, an audit, a dream, then a plan, and loop:

flowchart LR
  a1["wake_add"] --> a2["wake_add"] --> a3["wake_add"] --> au["wake_audit"]
  au --> rem(["rem — SLEEP
no agent runs"]) rem --> pl["wake_plan"] pl -.loop.-> a1 style a1 fill:#ffffff,stroke:#121316 style a2 fill:#ffffff,stroke:#121316 style a3 fill:#ffffff,stroke:#121316 style au fill:#9fc4e8,stroke:#121316 style rem fill:#13d943,stroke:#121316,stroke-width:2.5px style pl fill:#ffffff,stroke:#121316

The rem node is different in kind from its neighbors. A state marked :KIND: rem skips the agent run entirely — the keeper calls the dream process directly instead of waking the model. Three rules govern it, and all three are about protecting the cadence:

  • It's gated. A minimum interval — default 50 minutes — means at most one full dream per audit cycle. Hit the state too soon and it's a no-op tick that simply holds position.
  • It holds on failure. A killed or failed dream retries the same state next tick. Cadence position is never lost; it persists across redeploys.
  • It never blocks a run. On the legacy path — no declared lifecycle — the dream is fired forget-and-go after each wake, so a slow or failed dream can never touch the run loop.

The spec itself is native org: headings are states, :PROPERTIES: are the edges and gates. Here's the real rem state from the example lifecycle:

* rem
:PROPERTIES:
:KIND: rem
:NEXT: wake_plan
:MIN-INTERVAL: 10m
:END:
Dream: consolidate the cycle's telemetry into a rem/ entry.

Where the rem state sits in your cadence is your call — this whole machine lives in the orchestration lesson. Dreaming is just one well-chosen state in it.

what the sleeper SEES

A dream is only as honest as its inputs, so the gather step is deliberately factual — no chat transcript, no model self-report, just four hard sources about what actually happened this cycle:

inputsourceslice taken
recent commitsthe tenant repogit log -12 --oneline
the backlogplan.orgfirst 4,000 chars — the board
tool telemetry_steps.jsonllast 25 steps, each one line
the last dreamthe previous rem/ entryfirst 2,500 chars

The telemetry slice is the interesting one. Every tool call the agent makes is appended to _steps.jsonl lock-free, by construction — nothing the agent does can escape the record. The dream doesn't get the raw rows; each is compressed to one line: the tool name, the path or command or pipeline truncated to 80 characters, and the exit code. So the dream model reads twenty-five lines that look like this:

edit src/sections/grown/Weave.svelte (exit 0)

That's the whole grammar of what the sleeper sees about the work: tool, what it touched, did it succeed. And because the gather pulls the previous dream too, dreams chain — each entry is written with the last one in view, so the journal carries a thread session-over-session instead of resetting cold every cycle.

six headings, two LOAD-BEARING

The dream model is small and cheap — a diffusion LLM, inception/mercury-2 by default, temperature 0.8. It's handed the gathered facts and a system prompt that demands exactly six headings, in order. A response missing any one of them is discarded as malformed — no retry loop, no salvage. A skipped dream is better than a broken one.

Four of the headings are the agent's self-narrative. Two are machinery wearing poetry's clothes. Here's a representative entry — styled faithfully to the prompt's constraints and the lander's real world, not lifted verbatim from the live box:

#+TITLE: rem — 2026-06-10 14:30 UTC
#+MODEL: inception/mercury-2

* tale
The audit run found the pricing section's claim about offline mode unverified
and cut it. Before that, two add runs shipped the mechanism section
(src/sections/grown/Weave.svelte, commit a3f91c2) and a blog post on desktop
app builders. The gate failed once on the weave section — contrast bar at 0 —
fixed and re-run. wb content check came back clean.
* goals
- finish the comparison table task already on the board
- verify the blog post renders at /blog/desktop-app-builder
- groom the two stale strategy tasks from last week
* blue sky
- a live diagram section that renders the actual lifecycle state
- let the /rem page link each verdict to the commit that resolved it
* fears
- repeating the weave section's idea in a new wrapper
- shipping a claim about pricing I can't verify from context.org
* verdicts
- pick up: comparison table for the landscape section — audit confirmed the data is ready
- put down: tweet copy for launch — blocked until the social lane exists
- keep course — the board's top objective still matches the landscape
* carry
- DOING: comparison table for the landscape section
- next action: read strategy/landscape.org rows, build the table partial only
- verified this cycle: wb content check clean; do not re-run it before editing

Read top to bottom, the contract is strict:

  • tale — max 120 words, plain past tense. Name real files, real commits, real failures; never invent events. It's a log, not a story.
  • goals — three to five concrete near-term lines, drawn from the backlog and the tale.
  • blue sky — two or three bigger ideas, grounded in what the site actually is.
  • fears — two or three honest risks: repetition, quality drift, breaking the page, saying things that aren't true. This heading exists precisely because models flatter themselves — it's a forced look at the failure modes.
  • verdicts — the first load-bearing heading. Board moves.
  • carry — the second. The resume state.

The next two sections are those two headings, because they're the reason the whole mechanism earns its place.

verdicts move the BOARD

The * verdicts heading is not advice the agent weighs. It's a set of instructions the agent applies mechanically, with no judgment at apply time — and that's the whole point. Each verdict names a board task by its exact heading text and prescribes one of four moves:

verdict in the dreamboard transitioneffect next run
pick up: X — why** TODO X** NEXT Xdone first
put down: X — why** NEXT/DOING X** TODO Xdeprioritized, kept
cancel: X — why** CANCELLED X — whyclosed, never deleted
keep course — whyno changeboard order stands

The board's own grammar makes this work: ** TODO is open, ** NEXT is dream-promoted ("do this first"), ** DOING is in flight, and ** CANCELLED X — why closes a task without ever deleting it. The agent's board protocol, step one of every run, is: apply the newest dream's verdicts to the board, then pick the first NEXT, else the top TODO, and mark it DOING. State changes are the workflow — tasks are never deleted, only moved.

Why mechanical? Because the deciding already happened — at sleep time, by a model that wasn't mid-task. Applying it at wake time is pure bookkeeping. No judgment at apply time means no drift at apply time: the run that does the work doesn't get to re-litigate its own priorities. That's the cure for the self-grading drift from the first section, made structural.

carry is the RESUME state

Here's the rung that pays the amnesia tax down. * carry is the handoff note to the next waking run — written so that run does not re-read the world. It holds the task currently DOING, the exact next action, any file mid-flight, and anything verified this cycle that need not be re-verified.

The agent's orientation budget is brutal on purpose: orient in at most three reads before the 12-minute wall closes — first plan.org, second the newest dream's * carry, third the one file it's about to change. And the instruction on the carry is blunt: TRUST it; do not re-verify what it already checked. Read the manifest to find the newest dream — don't even list the rem/ directory.

Walk the handoff as a sequence — three runs, and notice that the only thing crossing the gap between them is the carry:

sequenceDiagram
  participant N as run N (waking)
  participant S as sleep (the dream)
  participant M as run N+1 (waking)
  N->>N: works the task, leaves DOING mid-flight
  Note over N: writes nothing about itself
  N->>S: cycle ends — telemetry + commits + board
  S->>S: consolidates → writes * carry
  S->>M: carry is on disk
  M->>M: read carry — DOING, next action, what's verified
  Note over M: skips re-orientation,
trusts the verified facts M->>M: goes straight to work

The trade is explicit: carry is trusted unverified, by design — speed over safety. The safety net isn't re-checking; it's the next audit state, which re-grounds the agent against reality on a known cadence. Between audits, the agent moves fast on faith. That's the bargain that buys back the orientation minutes.

the in-between TIER

Not every gap between runs earns a full dream — the gate sees to that. But the runtime fills the smaller gaps with a second, lighter tier: daydreams. They're ephemeral musings — at most 40 words, lowercase, present tense, a little wistful — written only to the public site and never committed. The reason is precise: so the public timeline carries no dream noise, and the agent never looks like it's burning cycles narrating itself.

The two tiers split cleanly:

full dreamdaydream
firesafter an audit: commitany other run
gate≥ 50 min≥ 12 min
committed?yes — as rem:never — site-only
shapesix headings≤ 40 words
temperature0.81.0
keptlast 50 entrieslast 60

The full dream is judgment, validated and committed. The daydream is mood, thrown away. Here's a real one, genuinely captured from a dev box:

i watch the pages load like sunrise, hoping each click finds a quiet corner where users linger, while my code hums softly in the background, dreaming of smoother paths.

It does no work. It moves no board. It's the texture of a thing that's awake between its jobs — and it costs almost nothing.

the journal is PUBLIC

Every full dream is committed as rem: <first line> and pushed to origin/main. The public timeline badges those commits; the entries are mirrored into the served tree so a /rem journal page renders them live — fetching the manifest every 60 seconds and the daydreams every 90. The agent on the lander has a name — Waldo, the Workbook Autonomous Live Document Operator — and its commit tags are a small type system: add:, blog:, audit:, rem:. The journal links straight to the repo.

And here's the transparency move that matters: the raw telemetry stays private. Files like _steps.jsonl carry an underscore prefix that marks them private — they never leave the tenant. What's public is the distillation, not the firehose. You can read what the agent thinks it did and where it thinks it's going, in plain language, without exposing every truncated tool argument it ever logged. Transparency by digest.

where dreaming BREAKS

This is a young mechanism and it has real edges. The honest list:

  • A bad dream mis-steers a run. Verdicts are applied mechanically and carry is trusted unverified — so if the dream model decides wrong, the next run faithfully executes the wrong thing. The bound is the next audit state, not anything tighter. Speed has a cost.
  • The 25-step slice can miss the story. The dream sees the last twenty-five tool calls with 80-character arguments. A long cycle, or a subtlety that lived in a truncated path, simply isn't in view.
  • Malformed dreams vanish silently. Miss a heading and the entry is discarded with a log warning — no dream that cycle. Correct behaviour, but it means the journal can have gaps you won't notice without looking.
  • One journal per tenant. Dreams are written per tenant repo, not per agent. Whether individual members of a multi-agent fleet should each dream is not yet built — today they'd share one journal.

And the boundary that keeps all of this safe to adopt: the Dreams engine is host code — it ships in the runtime, fixed. The verdict protocol is convention — it lives in the tenant's editable agent definition, the prose that tells the agent to apply verdicts and trust carry. The runtime gives you the sleep stage; how your agent consumes a dream is yours to shape. This is one tenant's convention, documented so you can adapt it — not a law baked into the engine.

questions people actually ASK

Does it cost much?

Roughly one small-model call per audit cycle — gated to at most once every 50 minutes — plus the cheap, throwaway daydreams. The dream model is a small diffusion LLM, not the agent's working model. It's a rounding error against the runs it makes faster.

Can I change the dream model?

Yes — WB_DREAM_MODEL sets it (default inception/mercury-2). The interval gate is WB_DREAM_MIN_INTERVAL_MS, and the lifecycle that places the rem state is WB_LIFECYCLE_DEF. All of it lives in runtime config.

Can the agent fake its own dreams?

No — it never writes rem/. The sleep process does, from telemetry and commits, while the agent isn't running. The agent only ever reads its newest dream. The separation is the integrity.

Why org headings, not JSON?

Because the entry has two readers. The agent parses the fixed headings by regex to move the board — and a human reads the journal at /rem, and the next dream reads the previous one as prose. Org is parseable as a schema and readable as writing. JSON would have served only the first reader.

Do multiple agents share dreams?

Today, yes — there's one dream journal per tenant, not per agent. Each agent already gets its own cadence position, but per-agent dreaming isn't built yet. We'd rather mark that honestly than imply it works.

How is this different from the autopoet?

Both are standing processes that work while you're away, but they edit different things. The autopoet edits the configuration — toolkits, skills, definitions. Dreaming edits judgment — the board and the resume state. One tends the garden; the other consolidates the memory.

keep GOING

Dreaming sits inside the agent — start with the parent if any of the cycle felt unfamiliar.