the file that knows too MUCH
The parent lesson ended on a promise that doubles as a threat. An agent's working directory — its notes, its drafts, its long-term memory, its decision logs — lives in the same disk as the deliverable. Hand someone the file and you've handed them the project. Wonderful, until you read it the other way: the file that carries your finished report also carries every wrong turn the agent took getting there, every page it read, every key it signs with.
This is not a hypothetical we dreamed up to sound careful. The module that fixes it says so in its own first lines: the default exists to prevent the exact class of leak this project already hit — beads task data pushed to GitHub. Someone ran a bulk commit, the issue tracker rode along, and a public repo got a tour of the backlog. The lesson was cheap because nothing secret was in those tickets. Next time it might be.
So the worry is real and the stakes are concrete: if egress is all-or-nothing, the whole "the workspace is the artifact" pitch collapses, because no sane person ships a file that also ships their agent's brain. The rest of this page is the one mechanism that makes the pitch survive contact with the send button.
the BOUNDARY
1. a single boundary module — one source of truth for what is public and what is private — that every egress path consults before anything leaves, so the line can't drift between them. Its rule, stated once in the code: sharing exposes the work, never the session that produced it.
It is exactly seventy lines you can read in a sitting — Workbooks.Private.
That smallness is the design. There is no privacy system with settings and
surfaces; there is one list, in one place, that three different exits all ask the same
question. Define the answer once where you can audit it, and no exit can quietly disagree
with another. The agent never has to remember the boundary, because the boundary
is automatic — appended, stripped, and recorded by the engine, not by anyone's discipline.
three doors, one BOUNCER
There are exactly three ways data leaves a workbook, and the trick is that all three ask the same module the same question first:
- Git commit — the work syncs to a tenant repo and, on push, to a host like GitHub.
- Bundle ship — the workbook is packed into a .wbundle and sent as a file.
- Library pack — a workbook is exported from your library to share or to archive.
Each of those used to be a place where someone could forget. Three exits would mean
three chances to leak, and three lists that drift apart over time until one of them is
wrong. Instead every door routes through Workbooks.Private before a byte
crosses — one bouncer, one guest list, three doors:
flowchart LR git["git commit
→ tenant repo, GitHub"] bun["bundle ship
→ a .wbundle file"] lib["library pack
→ share or archive"] priv{"Workbooks.Private
is this private?"} out["what actually leaves
the work — never the session"] stay["stays home
memory · scratch · telemetry · signing key"] git --> priv bun --> priv lib --> priv priv -- "public" --> out priv -- "private" --> stay style priv fill:#a8d4f0,stroke:#121316,stroke-width:2.5px style out fill:#aee5c2,stroke:#121316 style stay fill:#d9dbd3,stroke:#121316 style git fill:#ffffff,stroke:#121316 style bun fill:#ffffff,stroke:#121316 style lib fill:#ffffff,stroke:#121316
Read the graph as a funnel that narrows on purpose. Three arrows come in from the left — the three ways out — and they all converge on a single blue decision node before anything reaches the world. Whatever the node calls public continues out the right as the work; whatever it calls private drops into the grey box and never moves. One node, asked three times, is the entire guarantee. There is no fourth door that skipped the question.
what stays home, EXACTLY
A boundary you can't enumerate is a boundary you can't trust, so here is the real default list — the constants in the module, with what each thing actually is and why it must not ship. No mystery, nothing you can't audit:
| what stays home | what it actually is | why it can't ship |
|---|---|---|
_steps.jsonl | per-tool agent telemetry — one line appended per tool call | every move the agent made, including dead ends |
_ledger.json | the sealed run ledger — _steps.jsonl hash-chained and signed | the signed provenance trail of the session |
_status.json | pipeline stage status | internal pipeline state, not a deliverable |
_trace.jsonl | web-surface step traces — step, tool, output excerpt | raw working notes from live runs |
_telemetry.db | session telemetry | the session, not the work |
scratch/ | the agent's thinking-out-loud directory | drafts and half-thoughts, by definition |
.workbooks/ | the Ed25519 signing key — the tenant's identity | the private half of the key. Never. |
.beads/ | the issue tracker export | the exact thing that leaked once |
.claude/ | agent config and session scaffolding | operator config, not the artifact |
memory/ · tmp/ | the two private VFS volumes | long-term agent memory and scratch — not workspace |
_*.{jsonl,json,db} | any FUTURE session sidecar, caught by the _ prefix rule | future-proofing — no list edit needed |
That last row is the cleverest line in the module. The check is: a basename that starts
with _ and ends in .jsonl, .json, or .db
is private — full stop. So the day some new agent feature writes _dreams.jsonl,
it is already private, with nobody editing a list. The convention does the remembering.
And the module produces the matching .gitignore from those same constants —
this exact block, appended automatically to every tenant repo:
scratch/ .workbooks/ .beads/ .claude/ memory/ tmp/ _*.jsonl _*.json _*.db
One list, rendered for git. The file form and the volume form of "private" come out of the same place — which is the next section's whole point.
one boundary, both FORMS
depth rung · skippable — how one list keeps two shapes in lockstep
Private data exists in two physical forms, and a sloppy design would protect one and
forget the other. Form one: tree files — the _*.jsonl sidecars and the
dot-directories, sitting as actual files when a workbook is unpacked. Form two: VFS
volumes — memory and tmp living as rows inside the
SQLite disk, never unpacked at all. Same data, two skins.
The module keeps them in agreement by emitting the volumes as directory globs
in the gitignore output. So memory/ and tmp/ appear in the
ignore list even though, inside a packed workbook, they aren't directories — they're
database volumes. The names line up across both forms:
| VFS form (inside the SQLite disk) | file form (when unpacked to a tree) |
|---|---|
volume memory | directory memory/ |
volume tmp | directory tmp/ |
volume workspace — the only one that ships | directory workspace/ — public |
The verdict of that table is one sentence: if the SQLite disk is ever unpacked into a real tree, the same volumes git already treats as private in the file form are ignored there too. One boundary, both forms — so there is no clever path where data that's private as a volume becomes public the moment it lands as a file. The volumes lesson defines those three regions; this page only decides which of them crosses.
strip, then VACUUM
depth rung · skippable — what stripping a disk actually does
On the bundle door, the whole privacy mechanism is two SQL statements run against a copy of the workbook's own disk. The function writes the disk to a temp SQLite file and does this:
DELETE FROM vfs WHERE volume NOT IN ('workspace');
VACUUM; -- DELETE leaves recoverable pages; VACUUM rebuilds the file
The first statement is obvious — keep workspace, drop the rest. The second
is the one people forget, and forgetting it would be a quiet disaster. A SQLite
DELETE doesn't shred bytes; it marks pages free, and the deleted rows sit
recoverable in the file's slack space until something overwrites them. Ship that file and
a curious recipient can carve your agent's memory out of the free pages. VACUUM
rebuilds the file from scratch, so the deleted private rows are gone, not merely
unlinked. Absence, made real.
Then the bundle records the decision in its manifest, so the choice is legible to whoever receives the file:
sequenceDiagram
participant S as Bundle.ship
participant V as VFS.public_only
participant M as manifest.json
S->>V: hand over the disk bytes
V->>V: DELETE FROM vfs WHERE volume NOT IN ('workspace')
V->>V: VACUUM — rebuild, no recoverable slack
V->>S: stripped disk — workspace only
S->>M: volumes: ["workspace"], private_included: false
Walk that sequence as a short story. The ship step hands the disk to the stripper. The
stripper deletes every volume that isn't workspace, then vacuums the file so nothing is
carvable, and hands back a disk that contains only the public region. Finally the ship step
writes the receipt into the manifest — which volumes shipped, and a flat
private_included: false. The recipient never has to take your word for what's
inside; the manifest says so, and the bytes back it up.
One honest caveat lives right here. The stripper ends in a rescue that, if the strip itself crashes on malformed input, returns the original bytes — it fails open. In practice the input is the engine's own well-formed disk, so the rescue rarely fires; but it exists, and we'd rather you know than discover it. The honesty section comes back to this.
why add -A is SAFE
depth rung · skippable — the git door's one trick
The git door earns its safety with a single habit: on every repo init and before
every commit, the engine appends any missing private lines to the tenant repo's
.gitignore. It's idempotent — only the missing lines get added, existing
content is preserved — so the ignore file converges to the full list and stays there.
That one habit is what makes bulk staging safe. A git add -A is the natural
thing an agent reaches for, and ordinarily it's a footgun: it sweeps in everything,
including the .workbooks/ signing key. Here it can't, because the auto-gitignore
has already excluded session and secret data before the add runs. The same protection covers
the mirror-snapshot path the engine uses internally. The signing key is a Fly secret
(WB_SIGNING_KEY, a base64 Ed25519 seed) whose private half must never enter
version control — the ledger's whole attribution model rests on it
staying host-only — and the agent never has to think about any of that. It just commits.
# the tenant repo's .gitignore — appended automatically, idempotently scratch/ .workbooks/ .beads/ .claude/ memory/ tmp/ _*.jsonl _*.json _*.db # → a bulk `git add -A` physically cannot stage the signing key
The boundary the agent never has to remember is the public/private ignore, written for it. Discipline you have to maintain is discipline that eventually fails; discipline the engine enforces on every commit doesn't.
your .gitignore is the BOUNDARY
Here's the part that turns this from architecture into something you actually drive. The
library door runs a second filter after the built-in defaults: it honors the
repo's own .gitignore — using git's own matcher,
git check-ignore, not a reimplementation. That's a deliberate choice: exact git
semantics, and DRY, because nobody should re-derive gitignore globbing.
The consequence is the most useful sentence on this page. To mark something
"don't share," you mark it exactly the way you already mark "don't track" — a line in
.gitignore. No bespoke privacy API to learn, no manifest to hand-edit. The
instinct every developer already has — drop a path into .gitignore — is the
share boundary. The same logic governs build inputs: on a share, source build inputs
are dropped so you ship the compiled .wasm, not the source, unless you opt in.
So the library pack's share path is a double filter — the engine's safe defaults
(strip_parts), then your repo's own ignore rules layered on top:
flowchart LR parts["everything in the workbook"] d1["Workbooks.Private
strip the built-in defaults"] d2["git check-ignore
honor YOUR .gitignore"] ship["what ships"] parts --> d1 --> d2 --> ship style d1 fill:#a8d4f0,stroke:#121316 style d2 fill:#aee5c2,stroke:#121316,stroke-width:2.5px style ship fill:#ffffff,stroke:#121316 style parts fill:#ffffff,stroke:#121316
Read it left to right: the full set of parts enters, the blue node removes the engine's always-private defaults, the green node then removes anything your gitignore names, and only what survives both filters ships. The green node is the one you control. The defaults protect you from mistakes; your gitignore expresses your intent — and the share path respects both without you learning a new tool.
saying yes on PURPOSE
Privacy-by-default would be useless if you could never deliberately include the session —
for a handoff, or a backup of your own work. So opt-in exists, and it's always the same
explicit shape: include_private: true. That uniformity is intentional; it's the
same shape as the identity toolkit's --include-private, which emits the Ed25519
private key with a stderr warning and mode-0600 file writes. Including the session is a thing
you say, loudly, on purpose.
The library pack gives that decision a legible name — a purpose, not a raw flag:
| purpose | what ships | who it's for |
|---|---|---|
:share (default) | the work only — session stripped | anyone you send it to |
:archive | everything, session included | YOU — a backup of your own work |
The verdict of that table: a backup of your own work that forgot your session would be a
bad backup, so :archive keeps everything — a re-download of your own snapshot
still has your agent's memory and logs intact. :share strips, because the person
you're sending it to wants the report, not your run history. Same verb, one explicit word of
difference. And whichever one you chose, the manifest's private_included field
records it — so the recipient of a bundle can read manifest.json and see whether
the session is inside before they trust or open anything.
One thing survives the strip in every case: provenance. Even on a default share, the HTML is C2PA-signed with the tenant's DID — the signature ships, the session log doesn't. Proof travels; the session doesn't.
where the boundary ENDS
Honesty section. Five things this boundary is not, stated plainly.
The CLI can't opt in yet. The wbx pack verb always ships the stripped
form — it doesn't expose --include-private or --archive today. The
opt-in is real, but it's an engine/API option for now; the CLI flag is future work. If you
want an archive today, you reach the engine, not the command line.
The strip fails open. As the VACUUM section noted, if the stripper crashes on malformed input it returns the original bytes rather than failing closed. The input is normally the engine's own disk, so this is a narrow window — but it's a real one, and we name it.
There's no dedicated test suite. We looked: there is no test file targeting this module specifically. The guarantee here is architectural — one choke point, three callers, seventy auditable lines — not belt-and-braces suite coverage. That's a real property and a real limit at once; honest is honest.
Private isn't hidden from you. The boundary protects egress, not your own
control plane. Your engine's surfaces still read _steps.jsonl
to render live agent activity — by design. You can watch every move your agent makes; the
point is only that none of it leaves. Private means "doesn't cross the boundary," not "secret
from its owner."
Opt-in is irreversible once sent. A recipient of an include_private
bundle has everything — the memory, the telemetry, the lot — and you cannot un-send it. The
manifest tells them what they got; it can't claw it back for you. Choose
:archive for yourself, not for strangers.
questions people actually ASK
Is memory deleted from my copy when I share?
No. Egress rewrites a copy on the way out — the strip happens in a temp file,
never against your live disk. Your memory and tmp are untouched
by sharing; only the thing that left was slimmer than the thing you kept.
Can I see whether a bundle I received contains private data?
Yes — open manifest.json. The volumes field lists what shipped
(["workspace"] for a normal share, all three on opt-in) and
private_included is a flat true/false. The receipt is right there; you don't
have to spelunk the bytes to know.
How do I make my own file private?
Three ways, all things you already know. Give it an _ prefix and a
.json/.jsonl/.db extension and it's caught by the
prefix rule. Or add a line to .gitignore — the library share honors it via
git check-ignore. Or write it into the memory or tmp
volume. No special API; the boundary speaks the tools you have.
Is this encryption?
No — it's absence, not ciphertext. Private data simply isn't in the shipped bytes. Encryption-that-ships — where the bytes are present but unreadable without a key — is a different mechanism, covered in sealed sections and escrow. This page is about what leaves; those are about what unlocks.
Does provenance get stripped too?
No. The signature ships even when the session doesn't — the HTML is C2PA-signed with the tenant's DID on a default share. Proof travels; the session log stays home. A recipient can verify who published it without ever seeing how it was made.
Why one module instead of a setting per door?
Because three lists drift and one list can't. If git, bundle, and library each owned their own notion of private, the day one of them fell behind is the day something leaks. One seventy-line module, consulted by all three, means the boundary is defined exactly once — you can read it, audit it, and trust that every door agrees because they're all reading the same sentence.
keep GOING
This page is the sharing consequence of the disk — the neighbors below make it whole.