learn / 03·3 — under toolkit · lanes

the compilerRUNSin the sandbox

Everyone knows you can't run rustc or LLVM inside WebAssembly. So how does a toolkit get built in the sandbox? A lane is one language's path from untrusted text to runnable wasm where every executing stage is itself a wasm guest — and each lane is a different dodge. This page is the tricks, the convergence, and the ceilings.

lanes13 min read
A lone engineer in a glowing terminal-lit hall before six monumental parallel conveyor lanes, each feeding raw language slabs into translucent glass machines that fold and fuse them — all converging into a single towering forge stamped C, the small figure dwarfed by the architecture — 1970s sci-fi style, warm amber and green light

the compiler is the ATTACK surface

Installing software means running a compiler. Building a Rust crate runs build.rs; installing an npm package runs an install script; compiling C runs a driver that spawns a linker. Every one of those is arbitrary code executing natively, with your permissions, before you've read a line of what you pulled down. The artifact isn't the danger — the build is.

The reflex answer is an OS sandbox: wrap the native compile in bwrap on Linux or sandbox-exec on a Mac and let it run. The repo's own design doc rejects that as insufficient, in plain words — a native compile under bwrap or seatbelt is not a sufficient boundary for adversarial untrusted source on a shared host. OS sandboxes are defense-in-depth, not isolation. That gap is exactly why microVMs exist.

So the canon here is harder than "sandbox the build." It's that untrusted source never compiles or runs natively, at all. Every lane compiles and runs user source entirely under wasmtime — and to do that, the compiler itself has to be a wasm guest. The bet of this whole layer is that a compiler is just another program, and any program can be a guest.

what a LANE is

lane /leɪn/ noun

1. one source language's path from untrusted text to runnable, sandboxed wasm — where every executing stage is itself a wasm guest running under wasmtime, never a native process.

Lanes live one-per-language under runtime/compilers/<lang>/, and they share one framework — Workbooks.Compilers — instead of six bespoke integrations. Each lane is just a directory with a manifest.org: a handful of org keywords the framework parses to learn how the lane behaves. A directory becomes a lane the moment it has one.

#+COMPILER: rust
#+CLI_BIN:  mrustc
#+KIND:     compile-to-c-then-wasm
#+BUILD:    build.sh
#+TARGET:   wasm32-wasip1
#+SOURCE:   thepowersgang/mrustc @ be69c747

The load-bearing keyword is #+KIND, and there are exactly four shapes a lane can be. They name the three tricks plus the one that skips compiling entirely:

#+KIND — how a lane gets from source to runnable
compile-and-runthe tool both compiles and executes — c4, the C-in-four-functions VM
compile-to-cemit C, then hand off to the C lane — zig1's C backend
compile-to-wasmproduce a real wasm object — clang, the mrustc→C chain, the JS lane
interpret-in-sandboxdon't compile at all — run an interpreter on the source — yaegi for Go
six lanes, four kinds — every one of them a wasm guest at runtime

The compiler binaries are wasm artifacts, content-addressed by the sha256 of their bytes into build/commands/<sha>.wasm. The same is true of what they produce — an output is its hash, so identical source is identical artifact, deduplicated for free.

six lanes, three TRICKS

Here is the whole layer on one page. Six source languages enter on the left. Two of them never reach a compiler — c4 and yaegi short-circuit straight to "run." The rest each take a different dodge to wasm, and then — this is the thing to watch — almost everything funnels into the same two calls: clang -c to make an object, then wasm-ld to link it. Both of those are one 75 MB LLVM multitool. C is the bottleneck language of the sandbox, by design:

flowchart LR
  c[".c"] --> cc["clang -c"]
  zig[".zig"] --> z1["zig1.wasm
→ C"] --> cc rs[".rs"] --> mr["mrustc.wasm
→ C"] --> cc js[".js"] --> jc["js_src.c
byte-array"] --> cc ts[".ts"] --> tsc["tsc in QuickJS
→ .js"] --> jc c4lane[".c — c4 VM"]:::run go[".go"] --> yaegi["yaegi-run.wasm
interpret"]:::run cc["llvm.core.wasm
clang -c"] --> ld["llvm.core.wasm
wasm-ld"] --> out["out.wasm"] out --> run["wasmtime run"]:::run yaegi --> run c4lane --> run classDef run fill:#13d943,stroke:#121316,stroke-width:2px; style cc fill:#f3c5a3,stroke:#121316,stroke-width:2.5px style ld fill:#f3c5a3,stroke:#121316,stroke-width:2.5px style out fill:#fbfaf6,stroke:#121316

Read it as three tricks. Interpret it — c4 runs C in a tiny VM, yaegi interprets Go, QuickJS interprets JS, and a real tsc runs inside QuickJS to strip types. Transpile it to C — mrustc turns Rust into C, zig1 emits C from its C backend. Run a real LLVM someone already built for wasi — the keystone clang lane. Whatever the front half does, the back half is C, and C means clang plus wasm-ld. The rest of this page walks the lanes one at a time, each as a trick, a proof, and a limit.

the keystone: a linker that can't be SPAWNED

Depth rung. Start here because everything else converges here. The production C lane is YoWASP clang and lld, version 22.1.0 — LLVM built for the wasm32-wasi target, so it runs on wasmtime. It's prebuilt and sha-pinned from the @yowasp/clang npm package, not built in-house. The single file llvm.core.wasm is a ~75 MB multitool that dispatches on its first argument to act as either clang or wasm-ld, and it imports exactly one thing — wasi_snapshot_preview1. Nothing else.

Why clang and not something smaller? Because the plan was reshaped by one hard finding: no no-LLVM compiler emits wasm. tcc and chibicc emit native code; c4 interprets. A real C-to-wasm compiler running inside wasm is, necessarily, clang and LLVM. There's no lighter path that's honest.

And there's a wall that shapes the whole lane: WASI has no fork. The clang driver cannot spawn a subprocess under WASI, so it can't do its usual trick of internally invoking the linker. Compile and link become two separate llvm.core.wasm invocations the host orchestrates by hand:

sequenceDiagram
  participant H as host (Workbooks.Compilers)
  participant W as wasmtime
  H->>W: run llvm.core.wasm  clang -c  hello.c  → hello.o
  Note over W: /usr sysroot · /work job dir · /tmp writable
  W-->>H: hello.o (object)
  H->>W: run llvm.core.wasm  wasm-ld  hello.o -lc  -z stack-size=8388608
  Note over W: 8 MiB stack — wasm-ld's 64 KiB
default overflows modest buffers W-->>H: hello.wasm H->>W: run wasmtime hello.wasm W-->>H: 10!=3628800

The mounts are the same every job: the sysroot at /usr, the per-job directory at /work, a writable /tmp. The 8 MiB stack is a deliberate default — wasm-ld's stock 64 KiB overflows on modest automatic buffers, so the lane links with -z stack-size=8388608. A compiler gets raised budgets too: where an ordinary command run caps at 5 billion fuel and 30 seconds, a clang invocation gets 800 billion fuel and 180 seconds — generous, but still a hard runaway-trap, not infinity.

Multi-file builds compile their objects in parallel, bounded to min(schedulers-1, 6) at a time — worth roughly 1.7× on a real workload (a 33-file Lua build dropped from 280 s to 166 s). And the proof you can run yourself: Compilers.compile_and_run_c("hello.c") prints 10!=3628800, every stage above a wasm guest. (The c4 VM — rswier's "C in four functions" — still ships as the spike that first proved the model; the tcc productionization it once planned was superseded by this clang lane.)

Rust without RUSTC

Depth rung, and the crown jewel. The obvious move would be rustc.wasm. It doesn't exist, and it can't easily — for reasons that are themselves a tour of the WASI walls. rustc embeds LLVM as a library and shells out to a proc-macro server and a linker as subprocesses — impossible without fork. It wants threads and realpath, neither available under WASI. No prebuilt rustc.wasm exists, and the alternative backend cg_clif emits native code with no wasm32 target.

So the lane uses mrustc — a C++ program that compiles Rust to C. Built to wasm32-wasi, it runs in-sandbox; its C output goes straight into the clang lane above. The end-to-end proof is real and dated (2026-06-07). An honest Rust program —

fn fib(n: u32) -> u32 { if n < 2 { n } else { fib(n-1) + fib(n-2) } }

#[no_mangle]
pub extern "C" fn rust_compute() -> u32 { fib(10) }

— runs this chain, every single stage a wasm guest under wasmtime:

[1/6] mrustc.wasm:  libcore  → C      (-W exceptions=y, MRUSTC_TARGET_VER=1.74)
[2/6] mrustc.wasm:  prog.rs  → C
[3/6] clang.wasm:   core.c   → core.o
[4/6] clang.wasm:   prog.c   → prog.o
[5/6] wasm-ld:      crt1.o + objs + -lc + libclang_rt.builtins.a → prog.wasm
[6/6] wasmtime run prog.wasm
   → RUST IN WASM SANDBOX: 55

fib(10) is 55, and the kicker is the memory number: the build peaked around 600 MB — the wasm32 4 GB ceiling everyone worries about never bit. The live lane runs mrustc under wasmtime with -W exceptions=y -W max-wasm-stack=134217728 and the env MRUSTC_TARGET_VER=1.74, STD_ENV_ARCH=wasm32, TMPDIR=/tmp; its clang pass uses --target=wasm32-wasip1 -fwasm-exceptions -mllvm -wasm-enable-sjlj. The caller surface is one function: Workbooks.Compilers.rust_compile_to_wasm(rs, deps: ["[email protected]"]).

Full std works too — a program printing HELLO FROM RUST STD IN WASM: 42, and a sum(1..=10)=55 std proof. The non-obvious part: libstd is prebuilt BY mrustc.wasm, not by a native rustc — hash and ABI consistency demand the same compiler build std and user code. If std isn't prebuilt, the lane returns {:error, {:libstd_not_prebuilt, dir}} rather than silently shelling out to a native rustc. That refusal is the doctrine in one return value.

the dependency FRONTIER

Source compiling is table stakes; dependencies are where it gets real. The Rust lane fetches crates from the crates.io sparse index at index.crates.io plus the static CDN, and compiles them transitively — in-sandbox, same chain as user code. Version resolution walks a fallback: try the exact pin, then a curated floor, then newest-first, capped at six candidates, with an edition fallback from the declared edition down to 2021. Twenty-three crates are proven as of that same date — including fnv, byteorder, base64, memchr, ryu, bitflags, num-traits, smallvec, lazy_static, cfg-if, anyhow, once_cell, log, bytes, and regex-syntax.

The capability map, honestly drawn — what compiles, by what mechanism:

crate classworks?mechanism
pure-source crates (fnv, base64, memchr…)yes — 23 provensparse-index fetch → mrustc → clang, transitively
declarative-macro crates (macro_rules!)yesmrustc expands them natively in-pass
build.rs autocfg probesyesskipped — the conservative fallback is correct
version-pinned deps (regex, syn, serde)yes, with floors@version_floors papers over the language ceiling
proc-macro derives (#[derive(Serialize)])bridge built, ceiling-gatedWASM token-server + host-import spawn — see below
codegen build.rs (include!(env!("OUT_DIR")))not yetfrontier — listed, not solved

The floors and hints are real constants in the lane: @version_floors pins regex to 1.5.4, syn to 1.0.109, serde to 1.0.156; @feature_hints turns on std and unicode-perl for regex so the right code path compiles. These exist to dodge the mrustc language ceiling — roughly Rust 1.74 today (up from 1.54 after the version bump). Newer releases like syn-2 or edition-2024 don't compile; the floors hold the lane on versions that do.

Proc-macros are the big unlock, and the hardest. mrustc natively pipe()s and posix_spawn()s a proc-macro executable — forbidden under WASI. The in-sandbox bridge compiles each proc-macro crate to a WASM "server" speaking mrustc's own token byte-protocol, patches the spawn into a host-import, and runs mrustc under Wasmex (Workbooks.ProcMacroHost.run_mrustc) so the host fields the spawn. It even spoofs the target spec — reporting os-name = "linux" while the arch stays wasm32 — so syn's not(all(wasm32, os in (unknown, wasi))) cfg-guard passes. The bridge is built; the fight is the same language ceiling that gates everything else. proc-macro2 compiles and runs in-sandbox; the heavier syn-driven derives push right up against mrustc's frontier. We'd rather state that plainly than imply #[derive(Serialize)] is uniformly green end-to-end today.

The remaining frontier is a short, named list — codegen build.rs, proc-macros, the language ceiling, and runtime capabilities like tokio's net and threads (routed to BEAM-mediated Dock imports). The repo's own note on them is the one worth keeping: none are JIT-class walls.

the bootstrap compiler that only speaks C

The Zig lane runs zig 0.16.0's stage1/zig1.wasm — the bootstrap compiler. It runs the full Zig frontend and Sema and the C-backend codegen inside wasm, and emits C, which then takes the clang lane. Its #+KIND is compile-to-c, the second trick.

zig1 is bootstrap-only on purpose: every backend except the C backend is disabled, and the archiver and linker commands are stubbed out — build-exe panics on ar_command. It can do exactly one thing, build-obj -ofmt=c, which is precisely what this lane needs. Zig resolves every path against a single preopen, so the lane stages the per-job directory beside lib/ under one zig-root/ and preopens the whole root. The argv is exact:

zig1 build-obj -ofmt=c -OReleaseSmall --zig-lib-dir lib \
     --cache-dir jobs/<id>/zc --global-cache-dir jobs/<id>/gc \
     --name out -femit-bin=jobs/<id>/out.c \
     -target wasm32-wasi -Mroot=jobs/<id>/src.zig

Two committed bridges close the run step. wasi_shim.c forwards the bare WASI externs Zig declares — fd_write, proc_exit — onto wasi-libc's __wasi_* functions; and a prelude no-ops __builtin_return_address and __builtin_frame_address. The link runs with crt: false to avoid a duplicate _start. A std.debug.print program then runs and prints zig-e2e=42; a hello compile emits about 800 KB of genuine Zig C-backend output, headed by #include "zig.h". One subtle correctness fix: an empty out.c is now treated as a failed compile — zig1 can emit a zero-byte file on error, which used to slip through as success and surface as a misleading "_start undefined" link error later.

The honest wall: direct .zig→wasm in-sandbox — a self-hosted wasm backend built to wasi — is blocked upstream. os.realpath isn't available on WASI (ziglang/zig#20665) and there are open codegen bugs. So the lane goes through C, and says so.

don't compile it AT ALL

The Go lane takes the fourth trick: interpret-in-sandbox. It runs yaegi, traefik's pure-Go interpreter, cross-compiled to wasip1 by the native Go toolchain once — trusted provisioning, GOOS=wasip1 GOARCH=wasm go build. The native Go toolchain only ever builds the trusted runner; it never touches user code.

The elegant bit is how user source rides in. The package manager concatenates the runner with the untrusted Go source embedded in a wasm custom section named wbgosrc. The built module is literally these bytes:

yaegi-run.wasm
  ++ <<0>>                          ← custom-section id
  ++ leb128(payload_length)
  ++ "wbgosrc"                       ← section name
  ++ <the Go source>
  ++ <<byte_size(source)::big-64>>   ← length trailer
  ++ "WBGOSRC1"                       ← magic

This buys two properties at once. Custom sections are ignored at execution, so the file is still a perfectly valid, runnable yaegi module — nothing about the embedding breaks it. And because the source is now part of the bytes, each program is a unique, content-addressable, self-contained wasm — the runner extracts the source with a single pread of the 16-byte trailer (<length::big-64> followed by WBGOSRC1), no side files. The limit is verbatim from the manifest: stdlib only — no external module deps — single main package. Multi-file packages get merged into the one source yaegi evaluates.

embed the interpreter PER program

The JS lane is the one that bootstraps itself. QuickJS-ng v0.10.0 is compiled to wasm objects by the clang lanejs/build.sh compiles quickjs, cutils, libregexp, libunicode and xsum with llvm.core.wasm under wasmtime. The sandbox builds its own JS interpreter.

Then each program embeds it. The user's JS bytes become a C byte array, and the whole thing compiles and links into a standalone wasm per program:

const char wb_js_src[] = {104,101,108,108,111, … ,0};
const unsigned wb_js_len = N;

// link line shape:
wasm-ld crt1-command.o harness.o js_src.o \
        quickjs.o cutils.o libregexp.o libunicode.o xsum.o \
        -lc … -o out.wasm

The bridge is harness.c, which provides the Javy contract: Javy.IO.readSync(fd, u8) and writeSync(fd, u8) bound directly onto wasi-libc read() and write(), plus console.log and a hand-rolled TextEncoder/TextDecoder JS prelude — because quickjs-ng ships neither. It deliberately does not link quickjs-libc, whose POSIX bits (environ, signals, popen) don't exist under WASI. No JIT, no native javy.

TypeScript reuses this whole stack: the real tsc (typescript.js) runs inside QuickJS (qjs-run.wasm /w/tsjob.js), takes TS on stdin and type-strips via ts.transpileModule to JS on stdout, which then takes the JS lane.

And the speed valve. QuickJS is an interpreter, so bundling many files through it is slow. The bundler routes esbuild-first: esbuild compiled to wasip1 runs under wasmtime, which JITs it to native. A multi-file JS/TS/JSX bundle that takes about 23 minutes interpreted in QuickJS runs in about 160 ms this way. It falls back to the QuickJS bundler only when a Node core module needs the Dock shims. (For host capabilities, a harness_dock.o variant links env.* imports and must run under Workbooks.JsDock, not the bare wasmtime CLI.)

the honest LEDGER

The doctrine is stated in the repo and worth quoting: done equals an honest answer per language — each lane ships either a working in-sandbox compiler or a committed honest-blocker note. No stubs. Here is the ledger:

lanestatushard limittracking
C (clang.wasm)in-sandboxcompile/link split — no fork under WASIfm0.1
Zig (zig1→C)in-sandboxC backend only; direct→wasm blocked (zig#20665)fm0.2
Rust (mrustc→C)in-sandbox~1.74 language ceiling; proc-macros gatedfm0.3
JS (QuickJS)in-sandboxinterpreter — esbuild is the speed pathfm0.4
Go (yaegi)in-sandboxstdlib only; single main packagefm0.5
TS (tsc→JS)in-sandboxtype-strip only — rides the JS lanefm0.6
jco (typed WIT)native, by designruns JS under Node+wizer at build — can't be a guestfm0.7

jco is the one acknowledged native holdout — and it's outside the untrusted path on purpose. It executes JS under Node and wizer at build time to make typed WIT components, which are optional; the core dataflow uses the six in-sandbox lanes. The same is true of wac: it stays native because tokio won't target wasi, but it only composes already-built, validated trusted components — byte manipulation, no untrusted execution. (wasm-tools, for the record, does run in-sandbox.)

The most important honesty isn't a footnote — it's a return value. The old native build isolator Workbooks.Sandbox was deleted: with every untrusted compile in-wasm, there's no native compile left to isolate. The wasm sandbox is the boundary now. And the native fallbacks were ripped out, not hidden — ask for a native Rust crate build and you get {:error, {:lane_unavailable, :rust_crate_native_build}}; native Go, :go_package_native_build; native Zig, :zig_native_build. The lane fails loudly rather than quietly shelling out. That's the whole posture: a refusal you can see beats a fallback you can't.

questions people actually ASK

Can I use serde and #[derive(Serialize)]?

Pure-source crates: yes — 23 are proven. serde with a version floor compiles. The #[derive] macros are the frontier: the bridge that runs proc-macros in-sandbox is built (a WASM token-server plus a host-import spawn under Wasmex, with a spoofed target spec so syn's cfg-guard passes), but it's gated on the same mrustc ~1.74 language ceiling. proc-macro2 runs in-sandbox today; heavier syn-driven derives push right against that ceiling. Honest answer: lean on declarative macros where you can, and treat derives as in-progress rather than uniformly green.

Why is my JS bundle fast — or slow?

Routing. A multi-file bundle goes through esbuild compiled to wasip1, which wasmtime JITs to native — about 160 ms where the QuickJS interpreter would take ~23 minutes. You only fall back to the slow QuickJS bundler when a Node core module needs the Dock shims. A single program embeds QuickJS directly and runs interpreted, which is plenty fast for one file.

Can a malicious source escape during the compile?

The compiler is just another sandboxed program. It runs under wasmtime with a fixed preopen set — sysroot at /usr, the job dir at /work, /tmp writable — bounded by fuel and a wall clock, and the job dir is deleted after. There's no native build to attack, because there is no native build. The wasm sandbox is the boundary, not a layer on top of one.

Why not just bwrap the real rustc?

Because the repo's design doc calls that out as insufficient: a native compile under bwrap or seatbelt is not a sufficient boundary for adversarial untrusted source on a shared host. OS sandboxes are defense-in-depth, not isolation — which is the reason microVMs exist. Running the compiler as a wasm guest moves the boundary to where it actually holds.

Where do the compiler wasms come from?

Trusted provisioning, once. The compiler binaries are built by native toolchains — build.sh, provision-rust.sh — but those build only the trusted tools, never user code. clang is sha-pinned from npm; mrustc is built to wasm32-wasi; yaegi is cross-compiled by native Go. The derived artifacts are gitignored on purpose, and DeployKit regenerates them.

Why does everything go through C?

Because the only honest way to emit wasm for a compiled language inside the sandbox is LLVM, and the only LLVM that runs as a wasm guest is the clang multitool. So Rust and Zig transpile to C and hand off; even the JS lane compiles an embedded interpreter through clang. C is the bottleneck language of the sandbox — not by accident, by the shape of the available tools.

keep GOING

This deep dive sits under the toolkit lesson — start there for the bet in two sentences, then follow the funnel that decides what reaches a lane.