will it CROSS?
You have a tool that works — a Claude skill with a few scripts, an MCP server config, an npm CLI you reach for daily. You want it inside the sandbox, where it composes with everything else. The parent lesson's honest frontier told you the truth in the abstract: most software crosses cleanly, some puts up a fight, and how rough the crossing is depends on the tool. Fine. But which fight — yours?
Everywhere else, the answer to that question is a shrug dressed as a workflow: attempt the build, watch it break, read the wreckage, guess what to change, attempt again. The feedback comes late, costs an engine, and arrives as a stack trace instead of a plan. Worse, it's per-build, not per-dependency — a red X on the whole thing tells you nothing about which of your six dependencies is the one that can't follow.
What you actually want is a verdict per dependency — assigned before you spend a single build — and the remaining work itemized, because the one doing the conversion is very likely an agent, and an agent needs a list, not a hunch. That is exactly the instrument this lesson is about.
the DEFINITION
1. a static, offline classification of every carried script against the wasm lanes — three verdicts (ready · convertible · blocked), written into the toolkit's own manifest, with a checkbox fix-up plan for everything that isn't ready.
The module that does the work states its own creed in three words at the top of the file: static, offline, honest. Static, because it never runs your code — it reads it. Offline, because it needs no engine to render the report. Honest, because a diagnosis is not a failure: the bare audit always exits 0, even when every script is blocked. Telling you the truth is the success condition.
stage two of THREE
The audit doesn't stand alone — it's the middle of a three-stage intake ramp, the path an existing tool walks to become a toolkit. The spec calls them three honest stages, and the order matters:
- Stage 1 — parse + scaffold.
wbx toolkit importis deliberately parse-only and local-only: it reads the source tool, writes a manifest, a skills overview, and the carried scripts — and leaves a piece of bait, a** TODO dependency auditheading, with a note saying the scripts are carried verbatim and not yet trusted to run in the sandbox. - Stage 2 — the dependency audit. This lesson. It classifies every carried script ready/convertible/blocked and writes the findings where the bait was.
- Stage 3 — the fix-up plan. Generated in the very same pass as stage 2 — the agent manual, written right below the findings.
The crucial detail: stage 2 runs automatically at the end of every import. You
don't have to remember to audit — the moment a tool is scaffolded, it's diagnosed, and the
TODO is replaced with a real report. If the audit somehow errors, the import still
succeeds and prints audit skipped — scaffolding a tool is never blocked on
judging it. The philosophy line from the spec governs the whole ramp: parse what's
parseable, do the work, and leave a manual — never a shrug.
flowchart LR src["an existing tool
skill · mcp · npm cli"] subgraph s1["stage 1 — import (parse-only)"] direction TB sc["scaffold:
manifest.org · skills · scripts/"] bait["leaves bait:
** TODO dependency audit"] end subgraph s23["stages 2+3 — one audit pass"] direction TB aud["audit_static
scan · classify"] wm["write_into_manifest
** dependency audit (static, auto)"] fp["fixup_plan
** TODO fix-up plan [0/n]"] end src --> s1 s1 -- "auto-runs at end of import" --> s23 aud --> wm --> fp reentry["wbx toolkit audit <dir>"] -. "re-run anytime; replaces the last" .-> s23 style s1 fill:#fbfaf6,stroke:#121316 style s23 fill:#fbfaf6,stroke:#121316 style sc fill:#f2ddb0,stroke:#121316 style bait fill:#f3c5a3,stroke:#121316 style aud fill:#ffffff,stroke:#121316 style wm fill:#a8d4f0,stroke:#121316 style fp fill:#aee5c2,stroke:#121316 style src fill:#ffffff,stroke:#121316 style reentry fill:#ffffff,stroke:#121316
three WORDS
The whole audit reduces every dependency to one of three classes, and each verdict is a commitment, not a mood:
- ready — the lane covers it today. The posix shell shape, the quickjs lane, the
C lane for
jq. Push it; it runs. - convertible — a known recipe exists, but it needs work: a build, a shim, a re-route. The crossing is real but charted.
- blocked — native-only. It needs a redesign, or an engine-side capability that has no sandbox equivalent. No recipe makes it cross as-is.
Those verdicts aren't guessed per run — they come from two fixed truth tables, one for interpreters (read from the shebang) and one for binaries (spotted at command position). Here is the heart of both, with the audit's own reasons attached:
| dependency | verdict | the audit's reason |
|---|---|---|
sh · bash · zsh | ready | posix shape — shell runs in the sandbox |
node · js | ready | quickjs lane — most of Node's surface; full-Node APIs may need shims |
python · ruby · perl | blocked | no lane today — rewrite in a covered lane or split the logic |
jq | ready | c lane — jq compiles to wasm cleanly |
ffmpeg | ready | already a shipped toolkit — depend on it instead of bundling |
curl · wget | convertible | network is engine-brokered — route through the Dock, not raw sockets |
git | convertible | git exists engine-side — call through the engine, not a local binary |
npm · npx · bun · node | convertible | npm lane exists — resolve/bundle at build time, not install at runtime |
docker · podman | blocked | container runtimes can't nest in the sandbox — engine territory |
sudo · systemctl · launchctl | blocked | host administration — has no sandbox meaning |
osascript · open · xdg-open | blocked | host-desktop integration — no sandbox equivalent |
brew · apt · yum | blocked | host package managers — dependencies must compile into the toolkit |
Two things to read off that table. First, node appears twice on purpose —
as an interpreter it's ready (you're running JS, that's the quickjs lane), but as a
binary you shell out to it's convertible (you're install-at-runtime, which
the npm lane wants to replace at build time). The audit judges the role, not just the name.
Second, ffmpeg is ready not because it's small, but because it's already a
shipped toolkit — the right move is to depend on it, which is the
whole composition story made concrete in one verdict.
how it LOOKS without running
depth rung · skippable — the heuristics, for the curious
Static means the audit reads text and reasons about it — never executes it. The detection is a stack of small, honest heuristics:
- Interpreter, from the shebang. The first line's
#!names the interpreter;#!/usr/bin/env nodeis handled by taking the token afterenv. No shebang? Fall back to the extension —.sh→sh,.js/.mjs/.cjs→node,.py→python,.rb→ruby. - Binaries, at command position. A shell-ish heuristic that's good enough
offline: each line is split on the shell operators
| ; & ( `, the first word of each segment is taken as a command, a leading$is stripped, and the set is deduped. Comment lines (#,//) are skipped. - npm deps, by sniffing imports. Lines with
require(orfromyield their non-relative specifier — each becomes a convertible finding: npm lane — resolve + bundle at toolkit build time. - pip deps, only under python. When (and only when) the interpreter is python,
import X/from Xlines become blocked findings: no python lane.
The most important line in the whole scanner is the one about what it doesn't say. The binary table is a deliberately short list — the handful that genuinely can't follow, or genuinely can. Every other binary returns nothing: unremarkable or unknown — don't speculate. The audit stays silent about tools it doesn't recognize, on purpose. Silence is not endorsement; it's the refusal to invent a verdict it can't defend.
worst-of WINS
A script usually has more than one dependency, so the audit needs a rule for rolling its findings up into one verdict. The rule is severity, worst-of: ready (0) < convertible (1) < blocked (2). A script's class is the maximum severity of any finding inside it, and the summary counts roll up the same way.
The consequence is bracing and correct: one finding poisons the whole script. A
hundred clean lines of bash and a single import numpy, and the verdict is
blocked — because that one line is what will stop it crossing, and an audit that averaged
it away would be lying to you. Walk a real script through it:
| finding in fetch.sh | finding verdict | severity |
|---|---|---|
bash interpreter | ready | 0 |
curl binary | convertible | 1 |
| → fetch.sh rolls up to | convertible (1) | |
The shell itself was fine; the curl is the part that needs a re-route, so
convertible is the honest verdict — the worst of what's inside, surfaced as the verdict of
the whole.
the artifact carries its own AUDIT
Here is the move that makes this more than a linter. The findings don't go to a log or a
terminal you'll close — they go into the toolkit's own manifest.org.
The audit finds the stage-1 ** TODO dependency audit heading (or the previous
audit section, if you've run before), truncates from there, and appends a fresh section.
It's idempotent by design: every run replaces the last, so the carried audit can
never go stale.
This is the parent lesson's whole thesis — the manual is the interface — recursed one level down. The parent said a toolkit's documentation lives inside the toolkit so it can't drift from the tool. The audit's findings live in the same place for the same reason: a verdict that lives in the artifact can't disagree with the artifact.
Concretely. Stage 1 scaffolds this bait:
** TODO dependency audit The import was parse-only. Next: the wasm compatibility audit classifies every carried script ready/convertible/blocked and writes the fix-up plan here.
After the audit runs, that heading is gone, replaced by the real findings — one sub-heading per script, one line per finding, each carrying its kind, name, verdict, and the reason:
** dependency audit (static, auto)
*** calc.py — blocked (python3)
- interpreter =python3= :: blocked — no python lane today — rewrite in a covered lane or split the logic
- pip =numpy= :: blocked — no python lane
And the empty case is just as honest. A toolkit with no scripts/ directory —
pure guidance, manuals only — gets a one-line section that says exactly that: no carried
scripts — guidance-only toolkit, nothing to convert. The audit never pads, and never
pretends there's work where there isn't.
the agent MANUAL
The findings tell you what can't cross. Stage 3 — the fix-up plan, written in
the same pass, right below the findings — tells you what to do about it, in a
grammar an agent already executes. Only the non-ready scripts become work; a fully-ready
toolkit gets a short, honest ** fix-up plan / nothing to fix — every script is
sandbox-ready, plus the line that proves it.
Otherwise the plan opens with a real org checkbox-statistics cookie —
** TODO fix-up plan [0/n], where n is the count of non-ready scripts —
and a preamble that names the done-test plainly. Then, per script, a *** TODO
headline, a - [ ] checkbox for each concrete recipe step, and always a final
checkbox: re-run the audit, this file must classify ready. Continuing the same toolkit
from above:
** TODO fix-up plan [0/3]
The agent manual: work each item, check it off, then prove the whole
toolkit — done when =wbx toolkit push demo <dir>= · =wbx toolkit build demo=
· =wbx toolkit verify demo= all pass. With an engine reachable,
=wbx toolkit audit <dir> --fix= runs that push→build→verify for you.
*** TODO calc.py (blocked — python3)
- [ ] rewrite in JS for the quickjs lane — keep the script's CLI contract (same args in, same stdout out) so callers don't change
- [ ] or split the logic into org tasks the engine runs natively
- [ ] =numpy= goes away with the python rewrite (see the interpreter item)
- [ ] re-run =wbx toolkit audit= — calc.py must classify ready
Two things are doing the real work here. The [0/3] cookie is a live
progress meter — work items, check them, watch the fraction climb — and it's plain
org grammar, the same TODO grammar agents already
read off boards. And the final checkbox is the masterstroke: the
done-test for converting the script is re-running the very tool that judged it. The
plan is machine-checkable because its acceptance criterion is another run of the audit.
The recipes are specific, not generic. A python/ruby/perl interpreter gets rewrite
in JS for the quickjs lane — keep the script's CLI contract. A curl gets
route HTTP through the Dock — in JS use fetch (engine-shimmed); in
shell, call the engine's http capability from a task. An unknown interpreter gets
identify the language; if it's in a compile lane (c/zig/rust/go) declare a build recipe,
then wbx toolkit build produces the wasm. Each verdict knows its own way across.
closing the LOOP
The bare audit diagnoses. Add --fix and it converts — or rather, it runs the
exact three commands the plan's preamble named, against a live engine:
push → build → verify. Push zips the directory and installs it on the engine; build
runs the toolkit's build (resolving and bundling the npm lane, compiling the C/Zig/Rust
lanes); verify checks the result. These are real RCP calls to
/rcp/toolkit/install, /build, and /verify — the same
vocabulary the parent lesson used in its terminal story.
sequenceDiagram
participant wbx as wbx (your machine)
participant eng as engine (Nexus)
wbx->>wbx: audit_static — scan · classify · write manifest
alt --fix and an engine is reachable
wbx->>eng: install(zip) — /rcp/toolkit/install
eng-->>wbx: pushed
wbx->>eng: build — /rcp/toolkit/build
eng-->>wbx: built
wbx->>eng: verify — /rcp/toolkit/verify
eng-->>wbx: verified ✓
Note over wbx: exit 0 · fix: pushed → built → verified
else --fix but no engine
wbx->>eng: install …
eng--xwbx: unreachable
Note over wbx: exit 3 — start one with wbx deploy local
end
The output shapes match the two readers. For a human, the report prints, then a line:
fix: pushed demo → built → verified, followed by the engine's structural verify
lines. For an agent, --json wraps it in the mode envelope — the data carries
{ audit: {…}, fix: { pushed, built, verified } } — so a machine reads the result
without scraping prose.
And the failure modes are stable, not ad-hoc. The bare audit on a broken setup still exits
0 — a diagnosis isn't a failure. But --fix with no engine reachable exits
3, with the hint no engine reachable — start one with wbx deploy local
or set WB_ENGINE_URL. That 3 is part of a fixed exit-code map every wbx
verb shares, so agents can branch on it:
| exit | means | what an agent does |
|---|---|---|
| 0 | ok / diagnosis written | read the report, work the plan |
| 3 | engine unreachable | start an engine, retry --fix |
| 4 | not found | fix the id or path |
| 5 | verification failed | the conversion is wrong — re-open the plan |
| 6 | conflict | resolve the clashing state |
| 7 | auth rejected | fix credentials |
One honest note about what verify checks. The engine-side
verify is structural — manifest present, skills overview present, exec/cap/trust
checks — and it explicitly reports that native :role pre bash blocks are
disabled (the platform's native-exec ban). Verify confirms the toolkit is
well-formed and compatible. It does not confirm it's correct — that's a different
gate.
what the audit will NOT tell you
Honesty section. The audit is sharp precisely because it knows its edges, and you should too.
- Scope is
scripts/only. The file scan reads the files directly in the toolkit'sscripts/directory — non-recursive, sorted, and nothing else. No subdirectories, no other folders. A toolkit with noscripts/gets an empty audit. (The spec's stage-2 wording gestures at scanning for network and filesystem expectations; today's code doesn't do that — there's no manifest-dependency scan, no filesystem-expectation detection. This lesson sides with the code.) - The parse is heuristic. The command-position split is good enough offline, not a shell parser. It reads the common shapes well and won't catch every exotic construction. It's a diagnosis, not a proof.
- Silence is not a blessing. A binary the audit doesn't flag isn't certified safe — it's unrecognized, and the audit refuses to speculate. Absence of a verdict is absence of a verdict.
- "Blocked" means no lane today. The python verdict is dated, not eternal. It reflects the lanes that exist now; the day a python lane ships, that verdict changes. The audit reports the present, not a permanent law.
- Compatibility is not correctness. Ready means it'll run in the sandbox, not that it does the right thing. A script can be perfectly ready and perfectly wrong. Verify is structural; evals are the separate gate for whether the behavior is any good.
questions people actually ASK
Why did my manifest.org change after import?
Because the audit auto-runs at the end of every import and writes its findings straight
into the manifest — that's the design, not a side effect. It replaced the
** TODO dependency audit placeholder with the real report and the fix-up plan.
Re-running the audit replaces that section again, so it never goes stale. The artifact
carries its own audit on purpose.
It said ready — why did the build still fail?
Ready is a static verdict about lane coverage, not a build attempt. The node verdict says it plainly: the quickjs lane covers most of Node's surface, but full-Node APIs may need shims. Ready means the lane exists; it doesn't promise every API your script reaches is shimmed. The build is where you find the remaining gaps — the audit just gets you there with far fewer surprises.
Python is blocked — forever?
No. Blocked means no lane today. The verdict reflects the lanes that currently exist; it isn't a permanent judgment. The honest move right now is the one the plan prescribes: rewrite the logic in JS for the quickjs lane and keep the same CLI contract, or split it into org tasks the engine runs natively.
It didn't flag a weird binary I use — am I safe?
Not necessarily — you're unjudged. The binary table is a deliberately short list of the tools that genuinely can or can't follow; everything else returns nothing, because the audit won't speculate about a tool it doesn't recognize. Silence means "no verdict," not "approved." If it matters, test the crossing yourself.
Do I need an engine to audit?
No. The bare audit is fully static and offline — it scans, classifies, writes the
manifest, and exits 0, no engine required. You only need a reachable engine for
--fix, which actually pushes, builds, and verifies. Without an engine,
--fix exits 3 with a hint; the diagnosis itself never needs one.
What does an agent actually do with the fix-up plan?
It works it like any other plan. The [0/3] cookie and the - [ ]
checkboxes are plain org TODO grammar; the agent does each recipe step, checks it off, and —
because the last checkbox of every script is "re-run the audit, must classify ready" — it
re-runs the audit to confirm. When the whole toolkit is ready, it proves it with push →
build → verify, which is exactly what --fix automates.
keep GOING
This is the machinery under the parent's honest frontier. From here, follow the ramp out in either direction.