learn / 03·2 — under toolkit · audits

a verdictPERdependency

The parent lesson left you at the honest frontier: most software crosses into the sandbox cleanly, some puts up a fight. The audit is the instrument that tells you which fight — a static, offline verdict on every carried script, in three words: ready · convertible · blocked. No build attempt, no vibes — and it writes the remaining work, as a checklist, into the toolkit itself.

the audit11 min read
A lone inspector in a bright control hall stamps each of three towering monoliths with a glowing verdict — green READY, amber CONVERTIBLE, red BLOCKED — a wall of cargo waiting at a sandbox gate, 1970s sci-fi style, monumental machines dwarfing the small figure

will it CROSS?

You have a tool that works — a Claude skill with a few scripts, an MCP server config, an npm CLI you reach for daily. You want it inside the sandbox, where it composes with everything else. The parent lesson's honest frontier told you the truth in the abstract: most software crosses cleanly, some puts up a fight, and how rough the crossing is depends on the tool. Fine. But which fight — yours?

Everywhere else, the answer to that question is a shrug dressed as a workflow: attempt the build, watch it break, read the wreckage, guess what to change, attempt again. The feedback comes late, costs an engine, and arrives as a stack trace instead of a plan. Worse, it's per-build, not per-dependency — a red X on the whole thing tells you nothing about which of your six dependencies is the one that can't follow.

What you actually want is a verdict per dependency — assigned before you spend a single build — and the remaining work itemized, because the one doing the conversion is very likely an agent, and an agent needs a list, not a hunch. That is exactly the instrument this lesson is about.

the DEFINITION

au·dit /ˈɔː·dɪt/ noun

1. a static, offline classification of every carried script against the wasm lanes — three verdicts (ready · convertible · blocked), written into the toolkit's own manifest, with a checkbox fix-up plan for everything that isn't ready.

The module that does the work states its own creed in three words at the top of the file: static, offline, honest. Static, because it never runs your code — it reads it. Offline, because it needs no engine to render the report. Honest, because a diagnosis is not a failure: the bare audit always exits 0, even when every script is blocked. Telling you the truth is the success condition.

stage two of THREE

The audit doesn't stand alone — it's the middle of a three-stage intake ramp, the path an existing tool walks to become a toolkit. The spec calls them three honest stages, and the order matters:

  • Stage 1 — parse + scaffold. wbx toolkit import is deliberately parse-only and local-only: it reads the source tool, writes a manifest, a skills overview, and the carried scripts — and leaves a piece of bait, a ** TODO dependency audit heading, with a note saying the scripts are carried verbatim and not yet trusted to run in the sandbox.
  • Stage 2 — the dependency audit. This lesson. It classifies every carried script ready/convertible/blocked and writes the findings where the bait was.
  • Stage 3 — the fix-up plan. Generated in the very same pass as stage 2 — the agent manual, written right below the findings.

The crucial detail: stage 2 runs automatically at the end of every import. You don't have to remember to audit — the moment a tool is scaffolded, it's diagnosed, and the TODO is replaced with a real report. If the audit somehow errors, the import still succeeds and prints audit skipped — scaffolding a tool is never blocked on judging it. The philosophy line from the spec governs the whole ramp: parse what's parseable, do the work, and leave a manual — never a shrug.

flowchart LR
  src["an existing tool
skill · mcp · npm cli"] subgraph s1["stage 1 — import (parse-only)"] direction TB sc["scaffold:
manifest.org · skills · scripts/"] bait["leaves bait:
** TODO dependency audit"] end subgraph s23["stages 2+3 — one audit pass"] direction TB aud["audit_static
scan · classify"] wm["write_into_manifest
** dependency audit (static, auto)"] fp["fixup_plan
** TODO fix-up plan [0/n]"] end src --> s1 s1 -- "auto-runs at end of import" --> s23 aud --> wm --> fp reentry["wbx toolkit audit <dir>"] -. "re-run anytime; replaces the last" .-> s23 style s1 fill:#fbfaf6,stroke:#121316 style s23 fill:#fbfaf6,stroke:#121316 style sc fill:#f2ddb0,stroke:#121316 style bait fill:#f3c5a3,stroke:#121316 style aud fill:#ffffff,stroke:#121316 style wm fill:#a8d4f0,stroke:#121316 style fp fill:#aee5c2,stroke:#121316 style src fill:#ffffff,stroke:#121316 style reentry fill:#ffffff,stroke:#121316

three WORDS

The whole audit reduces every dependency to one of three classes, and each verdict is a commitment, not a mood:

  • ready — the lane covers it today. The posix shell shape, the quickjs lane, the C lane for jq. Push it; it runs.
  • convertible — a known recipe exists, but it needs work: a build, a shim, a re-route. The crossing is real but charted.
  • blocked — native-only. It needs a redesign, or an engine-side capability that has no sandbox equivalent. No recipe makes it cross as-is.

Those verdicts aren't guessed per run — they come from two fixed truth tables, one for interpreters (read from the shebang) and one for binaries (spotted at command position). Here is the heart of both, with the audit's own reasons attached:

dependencyverdictthe audit's reason
sh · bash · zshreadyposix shape — shell runs in the sandbox
node · jsreadyquickjs lane — most of Node's surface; full-Node APIs may need shims
python · ruby · perlblockedno lane today — rewrite in a covered lane or split the logic
jqreadyc lane — jq compiles to wasm cleanly
ffmpegreadyalready a shipped toolkit — depend on it instead of bundling
curl · wgetconvertiblenetwork is engine-brokered — route through the Dock, not raw sockets
gitconvertiblegit exists engine-side — call through the engine, not a local binary
npm · npx · bun · nodeconvertiblenpm lane exists — resolve/bundle at build time, not install at runtime
docker · podmanblockedcontainer runtimes can't nest in the sandbox — engine territory
sudo · systemctl · launchctlblockedhost administration — has no sandbox meaning
osascript · open · xdg-openblockedhost-desktop integration — no sandbox equivalent
brew · apt · yumblockedhost package managers — dependencies must compile into the toolkit

Two things to read off that table. First, node appears twice on purpose — as an interpreter it's ready (you're running JS, that's the quickjs lane), but as a binary you shell out to it's convertible (you're install-at-runtime, which the npm lane wants to replace at build time). The audit judges the role, not just the name. Second, ffmpeg is ready not because it's small, but because it's already a shipped toolkit — the right move is to depend on it, which is the whole composition story made concrete in one verdict.

how it LOOKS without running

depth rung · skippable — the heuristics, for the curious

Static means the audit reads text and reasons about it — never executes it. The detection is a stack of small, honest heuristics:

  • Interpreter, from the shebang. The first line's #! names the interpreter; #!/usr/bin/env node is handled by taking the token after env. No shebang? Fall back to the extension — .sh→sh, .js/.mjs/.cjs→node, .py→python, .rb→ruby.
  • Binaries, at command position. A shell-ish heuristic that's good enough offline: each line is split on the shell operators | ; & ( `, the first word of each segment is taken as a command, a leading $ is stripped, and the set is deduped. Comment lines (#, //) are skipped.
  • npm deps, by sniffing imports. Lines with require( or from yield their non-relative specifier — each becomes a convertible finding: npm lane — resolve + bundle at toolkit build time.
  • pip deps, only under python. When (and only when) the interpreter is python, import X / from X lines become blocked findings: no python lane.

The most important line in the whole scanner is the one about what it doesn't say. The binary table is a deliberately short list — the handful that genuinely can't follow, or genuinely can. Every other binary returns nothing: unremarkable or unknown — don't speculate. The audit stays silent about tools it doesn't recognize, on purpose. Silence is not endorsement; it's the refusal to invent a verdict it can't defend.

worst-of WINS

A script usually has more than one dependency, so the audit needs a rule for rolling its findings up into one verdict. The rule is severity, worst-of: ready (0) < convertible (1) < blocked (2). A script's class is the maximum severity of any finding inside it, and the summary counts roll up the same way.

The consequence is bracing and correct: one finding poisons the whole script. A hundred clean lines of bash and a single import numpy, and the verdict is blocked — because that one line is what will stop it crossing, and an audit that averaged it away would be lying to you. Walk a real script through it:

finding in fetch.shfinding verdictseverity
bash interpreterready0
curl binaryconvertible1
→ fetch.sh rolls up toconvertible (1)

The shell itself was fine; the curl is the part that needs a re-route, so convertible is the honest verdict — the worst of what's inside, surfaced as the verdict of the whole.

the artifact carries its own AUDIT

Here is the move that makes this more than a linter. The findings don't go to a log or a terminal you'll close — they go into the toolkit's own manifest.org. The audit finds the stage-1 ** TODO dependency audit heading (or the previous audit section, if you've run before), truncates from there, and appends a fresh section. It's idempotent by design: every run replaces the last, so the carried audit can never go stale.

This is the parent lesson's whole thesis — the manual is the interface — recursed one level down. The parent said a toolkit's documentation lives inside the toolkit so it can't drift from the tool. The audit's findings live in the same place for the same reason: a verdict that lives in the artifact can't disagree with the artifact.

Concretely. Stage 1 scaffolds this bait:

** TODO dependency audit
   The import was parse-only. Next: the wasm compatibility audit classifies every
   carried script ready/convertible/blocked and writes the fix-up plan here.

After the audit runs, that heading is gone, replaced by the real findings — one sub-heading per script, one line per finding, each carrying its kind, name, verdict, and the reason:

** dependency audit (static, auto)
*** calc.py — blocked (python3)
    - interpreter =python3= :: blocked — no python lane today — rewrite in a covered lane or split the logic
    - pip =numpy= :: blocked — no python lane

And the empty case is just as honest. A toolkit with no scripts/ directory — pure guidance, manuals only — gets a one-line section that says exactly that: no carried scripts — guidance-only toolkit, nothing to convert. The audit never pads, and never pretends there's work where there isn't.

the agent MANUAL

The findings tell you what can't cross. Stage 3 — the fix-up plan, written in the same pass, right below the findings — tells you what to do about it, in a grammar an agent already executes. Only the non-ready scripts become work; a fully-ready toolkit gets a short, honest ** fix-up plan / nothing to fix — every script is sandbox-ready, plus the line that proves it.

Otherwise the plan opens with a real org checkbox-statistics cookie — ** TODO fix-up plan [0/n], where n is the count of non-ready scripts — and a preamble that names the done-test plainly. Then, per script, a *** TODO headline, a - [ ] checkbox for each concrete recipe step, and always a final checkbox: re-run the audit, this file must classify ready. Continuing the same toolkit from above:

** TODO fix-up plan [0/3]
   The agent manual: work each item, check it off, then prove the whole
   toolkit — done when =wbx toolkit push demo <dir>= · =wbx toolkit build demo=
   · =wbx toolkit verify demo= all pass. With an engine reachable,
   =wbx toolkit audit <dir> --fix= runs that push→build→verify for you.
*** TODO calc.py (blocked — python3)
    - [ ] rewrite in JS for the quickjs lane — keep the script's CLI contract (same args in, same stdout out) so callers don't change
    - [ ] or split the logic into org tasks the engine runs natively
    - [ ] =numpy= goes away with the python rewrite (see the interpreter item)
    - [ ] re-run =wbx toolkit audit= — calc.py must classify ready

Two things are doing the real work here. The [0/3] cookie is a live progress meter — work items, check them, watch the fraction climb — and it's plain org grammar, the same TODO grammar agents already read off boards. And the final checkbox is the masterstroke: the done-test for converting the script is re-running the very tool that judged it. The plan is machine-checkable because its acceptance criterion is another run of the audit.

The recipes are specific, not generic. A python/ruby/perl interpreter gets rewrite in JS for the quickjs lane — keep the script's CLI contract. A curl gets route HTTP through the Dock — in JS use fetch (engine-shimmed); in shell, call the engine's http capability from a task. An unknown interpreter gets identify the language; if it's in a compile lane (c/zig/rust/go) declare a build recipe, then wbx toolkit build produces the wasm. Each verdict knows its own way across.

closing the LOOP

The bare audit diagnoses. Add --fix and it converts — or rather, it runs the exact three commands the plan's preamble named, against a live engine: push → build → verify. Push zips the directory and installs it on the engine; build runs the toolkit's build (resolving and bundling the npm lane, compiling the C/Zig/Rust lanes); verify checks the result. These are real RCP calls to /rcp/toolkit/install, /build, and /verify — the same vocabulary the parent lesson used in its terminal story.

sequenceDiagram
  participant wbx as wbx (your machine)
  participant eng as engine (Nexus)
  wbx->>wbx: audit_static — scan · classify · write manifest
  alt --fix and an engine is reachable
    wbx->>eng: install(zip) — /rcp/toolkit/install
    eng-->>wbx: pushed
    wbx->>eng: build — /rcp/toolkit/build
    eng-->>wbx: built
    wbx->>eng: verify — /rcp/toolkit/verify
    eng-->>wbx: verified ✓
    Note over wbx: exit 0 · fix: pushed → built → verified
  else --fix but no engine
    wbx->>eng: install …
    eng--xwbx: unreachable
    Note over wbx: exit 3 — start one with wbx deploy local
  end
  

The output shapes match the two readers. For a human, the report prints, then a line: fix: pushed demo → built → verified, followed by the engine's structural verify lines. For an agent, --json wraps it in the mode envelope — the data carries { audit: {…}, fix: { pushed, built, verified } } — so a machine reads the result without scraping prose.

And the failure modes are stable, not ad-hoc. The bare audit on a broken setup still exits 0 — a diagnosis isn't a failure. But --fix with no engine reachable exits 3, with the hint no engine reachable — start one with wbx deploy local or set WB_ENGINE_URL. That 3 is part of a fixed exit-code map every wbx verb shares, so agents can branch on it:

exitmeanswhat an agent does
0ok / diagnosis writtenread the report, work the plan
3engine unreachablestart an engine, retry --fix
4not foundfix the id or path
5verification failedthe conversion is wrong — re-open the plan
6conflictresolve the clashing state
7auth rejectedfix credentials

One honest note about what verify checks. The engine-side verify is structural — manifest present, skills overview present, exec/cap/trust checks — and it explicitly reports that native :role pre bash blocks are disabled (the platform's native-exec ban). Verify confirms the toolkit is well-formed and compatible. It does not confirm it's correct — that's a different gate.

what the audit will NOT tell you

Honesty section. The audit is sharp precisely because it knows its edges, and you should too.

  • Scope is scripts/ only. The file scan reads the files directly in the toolkit's scripts/ directory — non-recursive, sorted, and nothing else. No subdirectories, no other folders. A toolkit with no scripts/ gets an empty audit. (The spec's stage-2 wording gestures at scanning for network and filesystem expectations; today's code doesn't do that — there's no manifest-dependency scan, no filesystem-expectation detection. This lesson sides with the code.)
  • The parse is heuristic. The command-position split is good enough offline, not a shell parser. It reads the common shapes well and won't catch every exotic construction. It's a diagnosis, not a proof.
  • Silence is not a blessing. A binary the audit doesn't flag isn't certified safe — it's unrecognized, and the audit refuses to speculate. Absence of a verdict is absence of a verdict.
  • "Blocked" means no lane today. The python verdict is dated, not eternal. It reflects the lanes that exist now; the day a python lane ships, that verdict changes. The audit reports the present, not a permanent law.
  • Compatibility is not correctness. Ready means it'll run in the sandbox, not that it does the right thing. A script can be perfectly ready and perfectly wrong. Verify is structural; evals are the separate gate for whether the behavior is any good.

questions people actually ASK

Why did my manifest.org change after import?

Because the audit auto-runs at the end of every import and writes its findings straight into the manifest — that's the design, not a side effect. It replaced the ** TODO dependency audit placeholder with the real report and the fix-up plan. Re-running the audit replaces that section again, so it never goes stale. The artifact carries its own audit on purpose.

It said ready — why did the build still fail?

Ready is a static verdict about lane coverage, not a build attempt. The node verdict says it plainly: the quickjs lane covers most of Node's surface, but full-Node APIs may need shims. Ready means the lane exists; it doesn't promise every API your script reaches is shimmed. The build is where you find the remaining gaps — the audit just gets you there with far fewer surprises.

Python is blocked — forever?

No. Blocked means no lane today. The verdict reflects the lanes that currently exist; it isn't a permanent judgment. The honest move right now is the one the plan prescribes: rewrite the logic in JS for the quickjs lane and keep the same CLI contract, or split it into org tasks the engine runs natively.

It didn't flag a weird binary I use — am I safe?

Not necessarily — you're unjudged. The binary table is a deliberately short list of the tools that genuinely can or can't follow; everything else returns nothing, because the audit won't speculate about a tool it doesn't recognize. Silence means "no verdict," not "approved." If it matters, test the crossing yourself.

Do I need an engine to audit?

No. The bare audit is fully static and offline — it scans, classifies, writes the manifest, and exits 0, no engine required. You only need a reachable engine for --fix, which actually pushes, builds, and verifies. Without an engine, --fix exits 3 with a hint; the diagnosis itself never needs one.

What does an agent actually do with the fix-up plan?

It works it like any other plan. The [0/3] cookie and the - [ ] checkboxes are plain org TODO grammar; the agent does each recipe step, checks it off, and — because the last checkbox of every script is "re-run the audit, must classify ready" — it re-runs the audit to confirm. When the whole toolkit is ready, it proves it with push → build → verify, which is exactly what --fix automates.

keep GOING

This is the machinery under the parent's honest frontier. From here, follow the ramp out in either direction.