commands — the string that never runs

the string that RUNS

Every agent stack eventually arrives at the same dirty moment. The model decided to run jq, then grep, then something with the user's filename in it — and somewhere underneath, all of that gets glued into a single string and handed to a shell. The shell is the most powerful interpreter on the machine, and you just let a language model write programs for it out of text it read on the internet.

The industry's answer has been the blocklist: scan the string for ;, for &&, for rm -rf, for backticks, and refuse the ones that look dangerous. This is an apology, not a defense. A regex over a shell grammar is a regex over a grammar built to be re-quoted, base64'd, $IFS-smuggled and nested twelve ways. You're not winning that fight; you're announcing which round you lost.

So this page asks a different question. Not how do we sanitize the command line — but what if the command line were never a string at all?

the DEFINITION

com·mand /kə·ˈmand/ noun

1. a CLI converted to a runnable WASM module — stdin in, stdout out — that a workbook or agent invokes by name through the run-command Dock import. One name, one content-addressed artifact, one argument convention.

A toolkit is the whole bundle: a progressive-disclosure skill doc bundled with the CLI it documents. The command is what that CLI becomes — its runnable form. The manual tells an agent which commands exist; the commands are the verbs the manual is teaching. A capability need — network, a directory, more memory — isn't a separate layer underneath; it's a property of the command, declared and granted, nothing more.

a PATH with no operating system

There is no /usr/bin here, and no $PATH to scan. There's a registry — an Elixir module, CommandRegistry — and looking up a name walks two layers. First a dynamic map (everything agents and toolkits have registered this session, kept in :persistent_term). Then the static built-ins, merged last so they always win the lookup.

That merge order is a security decision, not an accident. The built-ins — upper, jq, grep, wbox — are reserved and unshadowable. A hostile dynamic registration cannot rebind jq to its own bytes for every Instance on the engine; the trusted name resolves to the trusted artifact, always. It's a supply-chain defense expressed as a sort order.

flowchart LR
  call["run-command jq"]
  subgraph reg["CommandRegistry — lookup"]
    direction TB
    dyn["dynamic map
session registrations
:persistent_term · cap 4096"]
    bi["built-ins
upper · jq · grep · wbox
RESERVED · merged last"]
  end
  art["build/commands/<sha>.wasm
content-addressed artifact"]
  call --> dyn
  dyn -- "overlaid by" --> bi
  bi -- "trusted name wins" --> art
  style reg fill:#fbfaf6,stroke:#121316
  style dyn fill:#f3c5a3,stroke:#121316
  style bi fill:#13d943,stroke:#121316,stroke-width:2.5px
  style art fill:#f2ddb0,stroke:#121316
  style call fill:#ffffff,stroke:#121316

Read the graph as one trip. A call for jq hits the dynamic map first, but the built-ins are laid over the top, so the reserved jq is the one that answers — and what it answers with is a path into the content-addressed store, build/commands/<sha>.wasm, where the artifact is named by the hash of its own bytes. A few guardrails ring the registry: names must match ^[A-Za-z0-9_.-]+$, the dynamic map is capped at 4096 entries so a flood can't exhaust persistent-term memory, and any registered path must canonicalize strictly inside that one store directory — a path that points anywhere else is refused before it can run.

And names hot-swap. Re-register a name and the binding is replaced in place — no restart, nothing reboots. Calls already in flight finish on the old bytes (the old sha file is never deleted out from under them); the next call observes the new binding. The registry is a living thing, edited while it runs, which is exactly the property the autopoet needs to fix a tool mid-session.

what one invocation COSTS

Invoking a command is not a function call into a shared process. It is a fresh wasmtime subprocess, born for this one run and gone after it — the :os_process isolation tier, earned by shape, not by a toggle. The envelope it runs inside is fixed and small:

limit	constant	what it stops
wall clock	`-W timeout=30000`	a command that hangs is trapped at 30s, not forever
fuel	`-W fuel=5_000_000_000`	a tight infinite loop burns fuel and traps even if it never blocks
stdin size	`@max_input_bytes 64MB`	an oversized pipe is a clear error, not an OOM
argv size	`@max_argv_bytes 256KB`	used to blow `ARG_MAX` and fail silently — now a stated error
file access	preopened dirs only (`--dir host::guest`)	no path resolves outside what was explicitly handed in

The table's verdict in one line: a command can fail, loop, or overflow, and every one of those is a bounded, named outcome instead of a way out. Two of the exit behaviours are worth saying aloud. A non-zero exit is usually just the guest CLI's normal failure — grep found nothing, jq hit a parse error — so the output returns verbatim and the status is preserved, which is what lets a shell && or || downstream make sense. That's the universal-CLI contract, kept honest. And the run uses a shared wasmtime compile cache, so the second invocation of python doesn't re-pay the JIT.

the six NO's

Between an agent saying run-command and any wasm starting stands the ExecBroker — and its first principle is the one most stacks skip: there is no real OS exec here. Only registered wasm commands run. Every request walks a deny ladder, in this exact order, and every denial is audited:

sequenceDiagram
  participant G as guest (the agent)
  participant B as ExecBroker
  participant R as CommandRegistry
  participant W as wasmtime subprocess
  G->>B: host_exec(name, argv, stdin)
  Note over B: 1 · revoked?        → :revoked
  Note over B: 2 · over quota?      → :rate_limited (2k/sec floor)
  Note over B: 3 · allow == false?  → :denied (DEFAULT-DENY)
  Note over B: 4 · depth > 8?       → :max_depth
  Note over B: 5 · in allowlist?    → :command_not_granted
  B->>R: lookup name
  Note over R: 6 · registered?      → :unknown_command
  R-->>B: artifact path
  B->>W: argv + stdin (structural)
  W-->>B: stdout · exit status
  Note over B: output capped at 8MB
  B-->>G: result (or audited deny)

Walk the ladder as a story. First the principal: if this tenant's been revoked, stop — :revoked. Second the meter: the rate limiter's default floor is two thousand broker calls per second per tenant; past that, :rate_limited. Third the default: allow is false until something grants it — default-deny, :denied. Fourth the depth: a chain deeper than eight is :max_depth. Fifth the allowlist: this Instance was granted a specific set of command names, and a name outside it is :command_not_granted. Sixth and last, after all that, the registry itself: a name nobody ever registered is :unknown_command. Six refusals before bytes execute, each one logged.

And one design choice underneath all six: exec is its own capability, distinct from commands, vfs, or networking. A profile can grant durable storage or the network without ever granting the ability to spawn a command — because the ungranted exec import is never even merged into the guest's import table. An un-granted cap isn't un-allowed; it's un-importable, which makes it un-callable.

argv is a list, not a SENTENCE

depth rung · skippable — the wire format, byte by byte

Here is the whole trick, in one diagram. When an agent runs grep "ada; rm -rf /", the broker never builds a string. It writes a length-prefixed, little-endian structure and the worst the dangerous part can do is fail to match:

agent runs:  grep "ada; rm -rf /" < users.txt

wire bytes:  [name_len:u32][name][argc:u32][(arg_len:u32)(arg)]*[stdin_len:u32][stdin]
             ─────────────────────────────────────────────────────────────────────
             [4] g r e p   [1]   [12] a d a ;   r m   - r f   /   [N] <users.txt bytes…>
              │    │         │      │   └── one argument. the ; is byte 0x3B, not syntax.
              │    │         │      └── arg_len, little-endian u32
              │    │         └── argc = 1 argument
              │    └── the command name
              └── name_len, little-endian u32

There is no shell to interpret that semicolon. It crosses the boundary as the bytes of one discrete argument; wasmtime receives a clean argv list of exactly two elements (grep and the literal needle); and the most hostile thing ada; rm -rf / can accomplish is to not appear in users.txt. The classic injection class isn't mitigated here. It's unrepresentable — there's no field in this structure where a metacharacter could mean anything but itself. No delimiter, so no delimiter ambiguity.

Two argument conventions ride this same wire. Most commands use :argv — real wasmtime argv, the universal CLI ABI. A couple of legacy interpreters (jq, grep) use :stdin1, where the filter or pattern is folded into the first line of stdin rather than passed as argv — an older protocol the registry still honors so those tools behave as their authors expect. Either way, the argument never becomes shell text.

the shell that ISN'T one

Agents still want to type jq '.users[].name' | grep ada | upper and have it work. They can — and there is still no shell. Workbooks.Shell is an Elixir pipeline evaluator: it parses the line, then runs each stage as a registered WASM command, piping stdout into the next stage's stdin in memory. No OS process, no fork, just wasmtime per stage.

flowchart LR
  src["jq '.users[].name' | grep ada | upper"]
  s1["jq
:stdin1 — filter on line 1
fresh wasmtime · fuel 5e9 · 30s"]
  s2["grep ada
:stdin1 — pattern on line 1
fresh wasmtime"]
  s3["upper
:argv — raw stdin
fresh wasmtime"]
  out["stdout"]
  src --> s1
  s1 -- "bytes piped in memory · trim:false" --> s2
  s2 -- "byte-exact · no fork" --> s3
  s3 --> out
  style s1 fill:#f3c5a3,stroke:#121316
  style s2 fill:#f2ddb0,stroke:#121316
  style s3 fill:#aee5c2,stroke:#121316
  style out fill:#13d943,stroke:#121316,stroke-width:2.5px
  style src fill:#ffffff,stroke:#121316

Follow that pipeline once, because it's the worked example that proves the whole construct. The shell splits the line quote-aware, so jq's inner | inside its quoted filter stays inside its stage and never reads as a pipe. Stage one is jq in :stdin1 mode, so the filter becomes the first line of stdin ahead of the JSON. It runs in a fresh wasmtime process — fuel five billion, thirty-second wall clock — and its stdout is piped on byte-exact, with trailing newlines untouched, because trimming a \n here would make a downstream wc -l lie. Stage two is grep, pattern on line one. Stage three is upper, argless, raw stdin. Three wasmtime processes, zero shell forks, and the exit status of the last stage is what drives any && that follows.

The shell speaks a deliberate subset: ;, &&, || with real short-circuiting on exit status; NAME=value variables with $NAME / ${NAME} expansion (an unknown name is left literal, so jq's own $var survives); and <, >, >> redirection confined to preopened directories — a path outside the sandbox is a clean {:outside_sandbox, file} error. Two tokens are intentional no-ops, not errors: 2>/dev/null and 2>&1 are swallowed silently, because choking on them was once the single biggest source of per-run thrash. And wbox — one multicall binary compiled from C — supplies the everyday vocabulary as applets: cat echo seq head wc nl rev basename dirname tr sort uniq tail true false, dispatched as wbox <applet>. Bash ergonomics, zero forks.

sixty-some programs, each a row of DATA

None of this would matter if the command set were a toy. It isn't. Around sixty-four real programs are seeded and live, each one proved by actually running it under wasmtime — and each one is, fundamentally, just data: a name, a URL, a sha256 pin, a wasm path, an arg mode. The lane is the registry mechanism; the tool is the row.

lane	trust gate	examples (a sample of ~64)
A — prebuilt fetch	sha256 pin (inert bytes until run)	`python` (CPython 3.12), `ruby`, `php`, `pandoc`, `openssl`, `sqlite3`, `prolog`, `wabt` → 12 tools (wat2wasm, wasm2wat, wasm-validate…)
B — C source built in-sandbox	sha-pinned tarball → clang.wasm	`lua`, `zstd`, `sodium`, `pcre2`, `png`, `harfbuzz`, `freetype`, `qjs` (27 names)
inline — self-authored	Instance Policy profile caps	Rust (`wfreq`, `rgx`), Python tools (`yaml`, `tmpl`), JS-npm (`pdf`, `arrow`, `protobuf`)

The verdict of that table: trust scales with origin, not with effort. A prebuilt is trusted exactly as far as its hash, which is why a sha pin is the whole gate — a .wasm someone hands you is inert until it runs in the sandbox. A C-source tool is trusted by the pinned tarball it was built from, compiled by a wasm clang with no native toolchain anywhere. And an inline command is trusted by the Instance profile that built it — it can hold no capability its Instance didn't already have. Whole interpreters arrive as four lines of data each; the WABT suite arrives as one tarball that fans out into a dozen commands.

Here's one real row, verbatim — CPython, in the sandbox, as four lines:

%{name: "python", kind: :wasm,
  url: "https://github.com/vmware-labs/.../python-3.12.0.wasm",
  sha: "e5dc5a398b07b54ea8fdb503bf68fb583d533f10ec3f930963e02b9505f7a763",
  mode: :argv}

Seeding happens at boot, inside a background Task, so fetching the catalog never blocks the engine from starting. By the time an agent asks, the rows are already in the registry.

bytes that prove THEMSELVES

depth rung · skippable — the supply chain, end to end

A name resolving to bytes is only safe if the bytes can't lie. So every lane funnels through the same content-addressing spine, and the artifact is re-checked at the last possible moment:

flowchart LR
  src["fetch (Erlang TLS, no curl)
or build (clang.wasm)"]
  pin["sha256 the bytes"]
  ca["write build/commands/<sha>.wasm
idempotent · same bytes, same path"]
  reg["register name → path
+ manifest registry.json"]
  run["RUN: re-hash the file
mismatch → :artifact_integrity"]
  src --> pin --> ca --> reg --> run
  run -- "match" --> ok["wasmtime executes"]
  style pin fill:#f3c5a3,stroke:#121316
  style ca fill:#f2ddb0,stroke:#121316
  style run fill:#13d943,stroke:#121316,stroke-width:2.5px
  style ok fill:#aee5c2,stroke:#121316

Trace the spine. Bytes arrive — fetched over a pure-Erlang TLS GET that verifies against the OS trust store (there is no curl binary anywhere; a prebuilt fetch is Erlang and TLS, full stop), or built in-sandbox by a wasm compiler. They're sha256'd, and that hash is the filename: build/commands/<sha>.wasm. The same bytes always land on the same path, so registration is idempotent. The name-to-path binding is recorded in a manifest, registry.json, so a reboot re-registers everything with no re-fetch and no rebuild. And then the closure that matters: at run time, the content-addressed file is re-hashed and refused on mismatch — {:artifact_integrity, path} — which closes the time-of-check / time-of-use gap that a path-based store would leave wide open.

The same posture extends outward. A workbook bundle carries its commands as <name>.wasm parts; Library.install sha-pins the bundle and registers each one — installing a bundle never widens capabilities. A third-party toolkit with invalid provenance refuses to build at all. And a toolkit whose declared CLI_BIN collides with a reserved name is rejected before compile, because that name is attacker-controlled data and the reserved set wins there too. The native escape hatches that other stacks lean on — cargo install, go get, native zig cc — return explicit :lane_unavailable errors here rather than shelling out. wasmtime running a .wasm is the architecture. Native binaries are banned.

commands that write COMMANDS

The catalog isn't a fixed menu — agents extend it mid-session. With wb toolkit build-inline <name> <lang> <file> an agent writes source in Rust, C, Zig, JS, TS, or Go, and the result is a new registered command. The astonishing part is what does the compiling: the compiler is itself wasm. Write → build → content-address → register, all inside the sandbox — no native toolchain, no host escape.

$ wb toolkit build-inline wfreq rust ./wfreq.rs
built + registered command `wfreq` (rust) → build/commands/<sha>.wasm
run it via the Dock: run-command wfreq

And self-authoring never widens caps. The command wfreq holds exactly the capabilities its Instance profile granted — nothing more — because it was built by a process that itself held only those caps. The other direction up the ladder is promotion: an inline command that proves useful can be materialized into a real toolkit directory, where Org owns the spec (a #+EXEC: command manifest, a #+TRUST: first-party line, a skill stub) and WASM owns the artifact. Promotion persists the source, not just the compiled bytes — session, to workspace, to registry. This is the toolkit lesson's "toolkits compose," seen from the runnable side.

the honest EDGES

Where this construct stops, plainly. The shell is a subset, on purpose: no globbing, no subshells, no command substitution, and stderr isn't real plumbing — 2>/dev/null is a swallowed no-op, not a redirection. If you need POSIX shell theatre, this isn't it; if you need a safe way to chain wasm commands, it is.

The banned native lanes are a choice, not a gap. cargo install, go get, and native zig cc return :lane_unavailable rather than quietly shelling out to a real toolchain — we'd rather hand you an honest error than a hidden escape. The same honesty applies to depth: the broker's eight-level recursion bound is plumbed and tested (a depth-99 call returns {:error, :max_depth} in the suite), but an ordinary plain-wasmtime command has no host_exec import at all — only Dock-harness artifacts can nest. So we won't claim live deep command chains beyond what actually ships; the bound is real, the nesting it bounds is narrow today.

And cold builds are slow. A C-source tool compiled by wasm clang the first time is not instant — but it's content-addressed, so it's a one-time cost; the second run is a hash lookup. Some software, finally, just fights the crossing — not everything that compiles to a native binary compiles cleanly to WASI, and when it doesn't, we say so rather than ship a broken row.

questions people actually ASK

Is this bash?

No — and that's the entire point. There's no shell process and no command line; the "shell" is an Elixir pipeline evaluator and each stage is a separate wasmtime subprocess. Arguments cross as a length-prefixed structural list, so a metacharacter is just a byte in an argument. You get pipeline ergonomics without ever building a string a shell could interpret.

Why did my command not see my file?

A command can only reach preopened directories, mapped in as host::guest with --dir. If you didn't preopen the directory the file lives in, there's no path inside the sandbox that resolves to it — by construction, not by a setting. Grant the directory and it appears.

Can a hostile command hijack jq?

No. The built-in names — jq, grep, upper, wbox — are reserved and merged last in the registry, so a dynamic registration can never shadow them. A toolkit that even declares a CLI_BIN colliding with a reserved name is refused before it compiles.

What if it infinite-loops?

Two independent traps catch it. A wall-clock timeout of 30 seconds, and a fuel budget of five billion instructions — so a loop that never yields still runs out of fuel and traps. There's no version of "spins forever" that survives both.

Can I run my favorite CLI?

Probably — in order of effort: if there's a prebuilt WASI build, pin its sha and it's four lines of data. If it's C source, point the C-source lane at a pinned tarball and wasm clang builds it in-sandbox. If neither, write a small command yourself with build-inline. What you can't do is shell out to a native binary — that lane returns :lane_unavailable on purpose.

What's the difference between the commands and exec caps?

commands is the broad ability to work with the command surface; exec is a dedicated, least-privilege cap that specifically governs spawning a command. They're separate so a profile can grant storage or networking without granting the power to run commands — and when exec isn't granted, its import isn't even bound into the guest, so it's not a permission to refuse, it's a call that can't be made.

keep GOING

Commands are the muscle of a toolkit — the layers around them are where they get their reach and their limits.

What is a toolkitthe bundle this is the runnable half of

→

The Nexusthe sandbox floor a command can't fall through

→

The VFSpreopens, and where command files live

→

Agentsthe one tool agents have is this shell

→