telemetry — one event, many readers

software that works UNWATCHED

You run things that work while you're not looking — keepers overnight, agents on a schedule, a workflow that fires at nine. The next morning the only question that matters is small and unanswerable in most stacks: what did it actually do?

The industry's answer to that question is a zoo. Stdout in one place, app logs in another, traces in a third, metrics in a fourth, a dashboard bolted over all of it, and an APM bill at the end of the month. Each tool has its own format; each integration is its own little project. You don't have an answer — you have five partial answers in five shapes, and the work of reconciling them is yours.

For agent systems it's worse, because the interesting unit isn't a request — it's a tool call. The thing you want to know is the sequence of moves: read this, ran that, fetched the other, hit a wall here. And half of those moves happen inside a sandbox you can't printf from. The most observable-hungry collaborator you've ever run is the one you can see the least.

the DEFINITION

te·lem·e·try /təˈlɛm·ə·tri/ noun

1. one event shape — a tool-call record of nine fields — written at one chokepoint, appended to one file per run; where every observer is a reader of those lines, never a second logger.

The whole design is in that last clause. There is one grammar of signal — the tool-call event — and many literacies: the summary command reads it, the website wire reads it, the dream digest reads it, the ledger seals it. Nothing exports a second format because nothing needs to. Here are the nine fields, with their real limits:

field	what it holds
`step`	monotonic counter for this run (the Dock lane uses an atomic counter; WASM spans leave it null)
`agent`	which agent took the step — the agent's name, or null for a singleton. This is how the activity wire groups by worker.
`tool`	the tool name — `shell`, `read`, or a prefixed origin: `command:<name>`, `wasm:<name>`
`args`	the call's arguments — path, cmd, query, url
`output`	the result — sliced to 4000 chars in memory, 200 chars in the file line. A record, not a transcript.
`exit_code`	0 for success, non-zero for failure
`error`	the error string, or null
`dur_ms`	wall time from the monotonic clock
`ts`	wall-clock seconds — `system_time(:second)`

That file is _steps.jsonl — one JSON line per tool call, appended for the life of the run. Everything else on this page is a way of reading it.

the CHOKEPOINT

The reason nothing escapes is structural, and it's worth being precise about. The event isn't built by the caller and isn't optional. It's assembled and appended at a single function — Agent.log_step — that sits inside the tool-call loop, fired for every step regardless of any caller-supplied on_step hook. A caller can subscribe to the live feed; a caller cannot opt a step out of the record. The phrase in the source is exact: nothing escapes by construction.

sequenceDiagram
  participant M as the model
  participant L as the tool loop
  participant F as _steps.jsonl
  participant S as on_step subscriber (optional)
  M->>L: call a tool
  Note over L: exec_bounded — 150s wall-clock ceiling
  L->>F: append one event (lock-free)
  L-->>S: same event, if anyone is watching
  Note over F: the append always happens
the subscriber is a bonus

Two details in that picture carry weight. First, the append is lock-free — it's the cheap, common path, so logging never becomes the bottleneck the loop is trying to observe. Second, every tool call is wrapped in a 150-second wall-clock bound. A wedged tool doesn't stall the run forever and vanish from the record — it times out, and the timeout becomes a tool-error event like any other. A hang is data, not a black hole. That single property is why the summary you read in the morning can be trusted to be complete even when last night went badly.

three writers, one GRAMMAR

Depth rung — skippable, but it's the part that makes the rest free. An agent doesn't only call native tools. It calls toolkit commands across the Dock membrane, and it runs work inside the WASM sandbox. Those are different worlds with different boundaries — and all three write the same file in the same shape:

flowchart TD
  n["native agent tools
tool: shell · read · fetch"]
  d["Dock command calls
tool: command:<name>"]
  w["WASM spans
tool: wasm:<name>"]
  f[["_steps.jsonl — one shape, one file"]]
  n --> f
  d --> f
  w --> f
  f --> sum["summary / index"]
  f --> wire["the /_activity wire"]
  f --> dream["the dream digest"]
  f --> seal["the signed ledger"]
  style f fill:#aee5c2,stroke:#121316,stroke-width:2.5px
  style n fill:#ffffff,stroke:#121316
  style d fill:#9fc4e8,stroke:#121316
  style w fill:#f3c5a3,stroke:#121316

The convergence is the whole trick. A Dock command call appends tool: "command:<name>" with its own exit code and timing; its step counter is an atomic so concurrent commands can't collide. The workdir it writes to is held in host context — the component inside the sandbox never sees the path, so a guest can't aim a write at the log. A WASM span — the host side of an instrument-enter / instrument-exit import pair — writes tool: "wasm:<name>" on exit, with the span's duration computed from a public span-stack because enter and exit are two separate crossings back into the host.

Because all three land the same nine fields in the same file, the reader that rolls up a run needs zero new query code to count a sandboxed command the same as a native one. One grammar in; one read out. That's not a convenience — it's the reason the read stack below is small.

One honest note on this lane: the WASM-span path is a complete host sink, but it isn't wired end-to-end yet. No telemetry capability exists in the policy profiles for a guest to call it through, and no test exercises it. The feasibility spike confirmed nested spans roll up correctly — a two-call run summing to forty milliseconds — but the guest-side transform tooling is an external blocker. We'd rather mark that clearly than imply the sandbox is already narrating itself.

the READ stack

Once the lines exist, observing is just reading them at different altitudes. The same grammar answers different questions:

reader	question it answers	surface	liveness
`summary/1`	what happened in this one run?	`wb telemetry <slug>` · `/api/telemetry/:slug`	live — works mid-flight, no db needed
`index/2`	what ran lately, across sessions?	`wb telemetry` · `/api/telemetry`	live — a pure scan, no extra writes
`persist/3`	the durable per-run query db	`_telemetry.db` (SQLite)	at run end — single writer, no contention
`/_activity`	what is it doing right now? (public)	the public plane, anonymous read-only	live — last 8 lines, slimmed
`AgentStream`	watch this run unfold, step by step	`/api/run/:id/stream` WebSocket	live — per-step frames

The two live readers earn the most. summary/1 is universal and live: it rolls a run up into stage, task count, tool calls, total milliseconds, errors, and the last fifteen steps — by reading the file directly, so any run is observable even mid-flight and even with no persisted database. index/2 is the cross-session view, newest first: it's a pure scan of the runs directory, which means it costs nothing and can't drift from the per-run truth, because it has no truth of its own to drift from.

Here's the loop you'll actually live in. The index, then a single run, then the proof:

$ wbx telemetry
SLUG          STAGE    CALLS  ERRORS  MS
wulu-refresh  done     42     0       183204
brand-run     error    17     3       96110

$ wbx telemetry brand-run
stage=error calls=17 errors=3 total_ms=96110
  ! step 9 shell: tool timeout
  ! step 11 fetch: exit 1 …

$ wb ledger wulu-refresh
tamper-evident=ok attributable=ok count=42 did=did:key:z6Mk…

That second block is the morning answer to what did my agent actually do — the failing run named, the two bad steps quoted with their exit shape, the timeout from the 150s bound showing up exactly where the chokepoint promised it would. No dashboard, no integration. One command reading one file.

Two softer readers ride on the same feed. The /_activity wire is the anonymous, read-only public view — it tails the last few lines of the tenant's _steps.jsonl and slims each to a tool, a target, a timestamp, and an agent, so a stranger can watch a public workbook work without seeing its outputs. And Thoughts writes the eight-word, debounced narration of the live feed you see on a board — generated lazily, only when someone is actually watching, and never otherwise.

record → MEMORY

Here the record stops being logging and becomes something logs never are: memory. After a run, a sleep phase digests the recent telemetry, the git log, and the backlog into a single journal entry. The agent reads its newest entry when it next wakes — so the trace of last night isn't a graveyard of lines, it's the thing the next run orients against.

sequenceDiagram
  participant R as run ends
  participant G as gather(steps + git log + plan)
  participant M as a small model
  participant J as rem/*.org
  participant N as the next waking run
  R->>G: last 25 steps, reformatted
  G->>M: digest it
  M->>J: one org entry — five fixed headings
  N->>J: read newest at orient time
  Note over N: resume from carry —
don't re-read the world

The transformation is concrete. The full dream takes the last 25 steps and reformats each line to a terse move — shell wb toolkit verify rss (exit 0) — then feeds that, the git log, and the plan to a small model (inception/mercury-2 by default). Out comes an entry like rem/2026-06-12-0415.org under five fixed headings: * tale, * goals, * blue sky, * fears, * verdicts, * carry. The * verdicts lines — pick up:, put down:, cancel: — are applied mechanically to the plan board, and * carry is the resume-state the next run reads instead of re-reading everything from scratch.

The cadence is deliberate. A full dream only fires after an audit: commit and at least fifty minutes since the last one; it commits as rem: <first line>. A lighter daydream — forty words, never committed — fires every twelve minutes or so from just the last six tool names. The agent reads its newest dream at orient time; it never writes one. Sleeping and waking are separate jobs, and telemetry is the bridge between them. The full story lives in the dreaming lesson.

record → PROOF

Depth rung. The same file that feeds memory can be sealed into proof. The ledger doesn't write a second log — it computes a seal over the one the telemetry already wrote.

flowchart LR
  s[["_steps.jsonl — raw bytes"]]
  s -- "h_i = sha256(h_i-1 ‖ line_i)" --> chain["hash chain
genesis: workbooks-ledger-v1"]
  chain -- "sign head with did:key" --> seal["_ledger.json
{v, did, count, head, sig, ts}"]
  seal -- "anchor: commit into the repo" --> git["the tenant repo"]
  style s fill:#aee5c2,stroke:#121316,stroke-width:2.5px
  style seal fill:#13d943,stroke:#121316,stroke-width:2.5px

The chain hashes each raw line into the next, genesis string workbooks-ledger-v1; the head is signed with the tenant's Ed25519 did:key. Verification returns two facts — tamper-evident (no line was changed) and attributable (this agent, this key, signed it) — plus the count and head. It's the same wb ledger <slug> line from the read stack above, sealed automatically at the end of a workflow run. The seal lives over the log, not beside it; the ledger lesson owns the full story.

record → SELF-EXTENSION

The last transformation closes a loop. When an agent hits a capability wall — its toolkit can't do the thing — it doesn't stall and it doesn't fake success. It files an issue with one field that matters: tried, the evidence of the wall, which is precisely a telemetry-shaped trace of what failed. The recorded failure becomes a request for a new capability.

flowchart LR
  wall["an agent hits a wall
tried: the failing trace"] --> fi["file_issue"]
  fi --> bl["the autopoet backlog
org files · SEEN dedup"]
  bl --> run["the autopoet works it
agent: autopoet"]
  run --> verify{"wb toolkit verify"}
  verify -- ok --> done["DONE"]
  verify -- unverified --> open["downgrade to OPEN"]
  style wall fill:#f3c5a3,stroke:#121316
  style done fill:#13d943,stroke:#121316,stroke-width:2.5px
  style open fill:#ffffff,stroke:#121316

The reply to the agent tells it to carry on — its job isn't to fix its own tools mid-run. The issue lands as an org file with a kind and a status; a duplicate from the same tenant bumps a SEEN count instead of re-filing, so the backlog is liberal to write but triaged by frequency. The autopoet picks the most-seen issue first and works it — and here's the part that matters for this page: the autopoet's own run goes through the same agent path, so its steps land in the same _steps.jsonl grammar, stamped agent: "autopoet". The system that extends the system is observable by the same telemetry it observes.

One guard is non-negotiable. A self-reported DONE is independently re-verified with wb toolkit verify; an unverified DONE is downgraded back to OPEN. The agent's word is a claim, not a proof — and the same honesty that runs through the record runs through the fix. The full account is the autopoet lesson's; this page only owes you the seam.

the SECOND lane

Depth rung. There's a second stream of signal that is deliberately not the step grammar, because it answers a different question: ops metering. Every broker decision — every allow or deny of a capability request — increments an atomic counter keyed by broker and outcome, and denials land in a small forensics ring (the last 128), with the guest-controlled target truncated to 512 bytes so a hostile component can't exhaust memory through the audit itself.

The important move is at the boundary: alongside its own counters, the broker audit emits a standard Erlang :telemetry event — [:workbooks, :broker, outcome] with the broker, reason, and target. That's the well-known observability contract the whole BEAM ecosystem speaks, so Prometheus, a SIEM, or an APM like AppSignal can attach to the engine's security signal without coupling to any internal ETS layout. The step grammar is for you, reading your own runs; this lane is where the engine's signal meets the outside monitoring world on the world's terms.

private by DEFAULT

All of this telemetry is intensely personal — it's the minute-by-minute record of how your agent thinks. So the rule is one sentence: sharing exposes work, never the session that produced it. One module owns that boundary, and every egress path — git, bundle, library — consults it before anything leaves the machine.

sidecar file	what it is	written at	ships when shared?
`_steps.jsonl`	the always-on step log	every tool call	never
`_status.json`	stage — running / done / error	at stage transitions	never
`_trace.jsonl`	a slim per-step trace (out ≤140)	per step, web runs	never
`_telemetry.db`	run-end SQLite query db	at run end	never
`_ledger.json`	the signed seal	at workflow end	never

The boundary isn't a list someone has to maintain — it's a pattern. The _* prefix paired with a .jsonl / .json / .db suffix catches every sidecar here and any future one the same way, so a new telemetry file is private the day it's invented. The same module auto-writes a .gitignore, which makes git add -A safe by default — you can't accidentally commit your own session. When you share a workbook, the work goes and the diary stays.

what it ISN'T

Honesty section, in full. The WASM-span lane is a complete host sink with no guest wiring yet — there is no telemetry capability for a sandboxed component to call it through, and no test exercises it. The spike confirmed it works; the guest-side transform tooling is the blocker. Treat sandbox self-narration as confirmed-feasible, not shipped.

The file is a record, not a transcript. Outputs are sliced to 200 characters in the jsonl line — enough to know what a step did and whether it worked, not enough to replay it verbatim. If you need the full output, you needed it at the moment it ran.

Workflow runs index under an ephemeral path (/tmp/bb). The live summary and index are real and free, but they're reading a working directory, not a warehouse — the durable copy is the run-end _telemetry.db and the sealed ledger, not the scan.

The step grammar is not an OTel exporter. Only the broker lane emits standard :telemetry events; the per-step record is its own shape, designed to be read by one file's worth of code, not piped into a vendor. And the log is editable until sealed — its tamper-evidence comes from the ledger's hash chain, applied at run end, not from the append itself.

Last and most important: none of this ever leaves your machine. This is your telemetry — the record you read to understand your own software — not product analytics, not a phone-home, not a metric we collect. The privacy section above isn't a setting. It's the default the whole egress path enforces.

questions people actually ASK

Can I watch a run live?

Yes, two ways. wb telemetry <slug> rolls up a run even mid-flight, because the summary reads the file directly and needs no finished database. And /api/run/:id/stream is a WebSocket that pushes a frame per step as it happens — read this, ran that — then a done frame. On a public workbook, /_activity shows the slimmed, anonymous version of the same feed.

Does my telemetry leave my machine?

No. The step log and its sidecars are private by construction — one module gates every egress path, and the _* naming pattern keeps them out of git, bundles, and the library automatically. Sharing a workbook ships the work, never the session that produced it. There is no collection, no phone-home, no analytics endpoint.

How do I hook up Prometheus or a SIEM?

Through the second lane. The broker audit emits standard Erlang :telemetry events — [:workbooks, :broker, outcome] with broker, reason, and target — which is the contract the BEAM observability ecosystem already speaks. Attach there and you get the engine's security signal without coupling to any internal layout. The per-step record is a different shape, meant for reading your own runs, not for scraping.

Can the agent fake its own log?

The append is written at one chokepoint inside the loop, fired for every step regardless of the caller — so a step can't quietly skip itself. The log is editable after the fact, though, which is exactly why the ledger exists: a hash chain over the raw lines, signed with the tenant's key, makes any later edit detectable and the run attributable. Trust the seal, not the raw file.

Where do the files go when I share a workbook?

Nowhere — they stay. _steps.jsonl, _status.json, _trace.jsonl, _telemetry.db, and _ledger.json all match the private-by-default pattern, so the egress path leaves them behind. The recipient gets your work and your files; they don't get your run's diary.

Why one file instead of proper logs, traces, and metrics?

Because for agent work the interesting unit is the tool call, and one event shape captures it whole. Splitting it across three subsystems buys you three formats to reconcile and three integrations to maintain. One append-only file means the summary, the website wire, the dream, and the ledger are all just readers — no new query code per reader, no drift between copies, because there's only one copy.

keep GOING

Telemetry is the nexus watching itself — and it feeds three of the most interesting ideas downstream. Start with the parent, then follow the transformations.

◆

The nexusthe engine this watches

→

The autopoeta recorded wall becomes a capability

→ ☾

Dreamingtelemetry digested into memory

→ ⛓

The ledgerthe same lines, sealed into proof

→