the compiler is the ATTACK surface
Installing software means running a compiler. Building a Rust crate runs
build.rs; installing an npm package runs an install script;
compiling C runs a driver that spawns a linker. Every one of those is
arbitrary code executing natively, with your permissions, before you've
read a line of what you pulled down. The artifact isn't the danger — the
build is.
The reflex answer is an OS sandbox: wrap the native compile in
bwrap on Linux or sandbox-exec on a Mac and let it
run. The repo's own design doc rejects that as insufficient, in plain words —
a native compile under bwrap or seatbelt is not a sufficient boundary for
adversarial untrusted source on a shared host. OS sandboxes are
defense-in-depth, not isolation. That gap is exactly why microVMs exist.
So the canon here is harder than "sandbox the build." It's that untrusted source never compiles or runs natively, at all. Every lane compiles and runs user source entirely under wasmtime — and to do that, the compiler itself has to be a wasm guest. The bet of this whole layer is that a compiler is just another program, and any program can be a guest.
what a LANE is
1. one source language's path from untrusted text to runnable, sandboxed wasm — where every executing stage is itself a wasm guest running under wasmtime, never a native process.
Lanes live one-per-language under runtime/compilers/<lang>/,
and they share one framework — Workbooks.Compilers — instead of
six bespoke integrations. Each lane is just a directory with a
manifest.org: a handful of org keywords the framework parses to
learn how the lane behaves. A directory becomes a lane the moment it has one.
#+COMPILER: rust #+CLI_BIN: mrustc #+KIND: compile-to-c-then-wasm #+BUILD: build.sh #+TARGET: wasm32-wasip1 #+SOURCE: thepowersgang/mrustc @ be69c747
The load-bearing keyword is #+KIND, and there are exactly four
shapes a lane can be. They name the three tricks plus the one that skips
compiling entirely:
The compiler binaries are wasm artifacts, content-addressed by the
sha256 of their bytes into build/commands/<sha>.wasm. The
same is true of what they produce — an output is its hash, so identical source
is identical artifact, deduplicated for free.
six lanes, three TRICKS
Here is the whole layer on one page. Six source languages enter on the
left. Two of them never reach a compiler — c4 and yaegi short-circuit straight
to "run." The rest each take a different dodge to wasm, and then — this is the
thing to watch — almost everything funnels into the same two calls:
clang -c to make an object, then wasm-ld to link it.
Both of those are one 75 MB LLVM multitool. C is the bottleneck language
of the sandbox, by design:
flowchart LR c[".c"] --> cc["clang -c"] zig[".zig"] --> z1["zig1.wasm
→ C"] --> cc rs[".rs"] --> mr["mrustc.wasm
→ C"] --> cc js[".js"] --> jc["js_src.c
byte-array"] --> cc ts[".ts"] --> tsc["tsc in QuickJS
→ .js"] --> jc c4lane[".c — c4 VM"]:::run go[".go"] --> yaegi["yaegi-run.wasm
interpret"]:::run cc["llvm.core.wasm
clang -c"] --> ld["llvm.core.wasm
wasm-ld"] --> out["out.wasm"] out --> run["wasmtime run"]:::run yaegi --> run c4lane --> run classDef run fill:#13d943,stroke:#121316,stroke-width:2px; style cc fill:#f3c5a3,stroke:#121316,stroke-width:2.5px style ld fill:#f3c5a3,stroke:#121316,stroke-width:2.5px style out fill:#fbfaf6,stroke:#121316
Read it as three tricks. Interpret it — c4 runs C in a tiny VM, yaegi
interprets Go, QuickJS interprets JS, and a real tsc runs inside
QuickJS to strip types. Transpile it to C — mrustc turns Rust into C,
zig1 emits C from its C backend. Run a real LLVM someone already built for
wasi — the keystone clang lane. Whatever the front half does, the back half
is C, and C means clang plus wasm-ld. The rest of this page walks the lanes one
at a time, each as a trick, a proof, and a limit.
the keystone: a linker that can't be SPAWNED
Depth rung. Start here because everything else converges here. The
production C lane is YoWASP clang and lld, version 22.1.0 — LLVM built
for the wasm32-wasi target, so it runs on wasmtime. It's
prebuilt and sha-pinned from the @yowasp/clang npm package, not
built in-house. The single file llvm.core.wasm is a ~75 MB
multitool that dispatches on its first argument to act as either clang
or wasm-ld, and it imports exactly one thing —
wasi_snapshot_preview1. Nothing else.
Why clang and not something smaller? Because the plan was reshaped by one hard finding: no no-LLVM compiler emits wasm. tcc and chibicc emit native code; c4 interprets. A real C-to-wasm compiler running inside wasm is, necessarily, clang and LLVM. There's no lighter path that's honest.
And there's a wall that shapes the whole lane: WASI has no fork. The
clang driver cannot spawn a subprocess under WASI, so it can't do its usual
trick of internally invoking the linker. Compile and link become two separate
llvm.core.wasm invocations the host orchestrates by hand:
sequenceDiagram participant H as host (Workbooks.Compilers) participant W as wasmtime H->>W: run llvm.core.wasm clang -c hello.c → hello.o Note over W: /usr sysroot · /work job dir · /tmp writable W-->>H: hello.o (object) H->>W: run llvm.core.wasm wasm-ld hello.o -lc -z stack-size=8388608 Note over W: 8 MiB stack — wasm-ld's 64 KiB
default overflows modest buffers W-->>H: hello.wasm H->>W: run wasmtime hello.wasm W-->>H: 10!=3628800
The mounts are the same every job: the sysroot at /usr, the
per-job directory at /work, a writable /tmp. The
8 MiB stack is a deliberate default — wasm-ld's stock 64 KiB overflows
on modest automatic buffers, so the lane links with
-z stack-size=8388608. A compiler gets raised budgets too: where an
ordinary command run caps at 5 billion fuel and 30 seconds, a clang
invocation gets 800 billion fuel and 180 seconds — generous, but still a
hard runaway-trap, not infinity.
Multi-file builds compile their objects in parallel, bounded to
min(schedulers-1, 6) at a time — worth roughly 1.7× on a real
workload (a 33-file Lua build dropped from 280 s to 166 s). And the
proof you can run yourself: Compilers.compile_and_run_c("hello.c")
prints 10!=3628800, every stage above a wasm guest. (The c4 VM —
rswier's "C in four functions" — still ships as the spike that first proved the
model; the tcc productionization it once planned was superseded by this clang
lane.)
Rust without RUSTC
Depth rung, and the crown jewel. The obvious move would be
rustc.wasm. It doesn't exist, and it can't easily — for reasons
that are themselves a tour of the WASI walls. rustc embeds LLVM as a library
and shells out to a proc-macro server and a linker as subprocesses —
impossible without fork. It wants threads and realpath, neither
available under WASI. No prebuilt rustc.wasm exists, and the
alternative backend cg_clif emits native code with no wasm32 target.
So the lane uses mrustc — a C++ program that compiles Rust to C.
Built to wasm32-wasi, it runs in-sandbox; its C output goes
straight into the clang lane above. The end-to-end proof is real and dated
(2026-06-07). An honest Rust program —
fn fib(n: u32) -> u32 { if n < 2 { n } else { fib(n-1) + fib(n-2) } }
#[no_mangle]
pub extern "C" fn rust_compute() -> u32 { fib(10) }
— runs this chain, every single stage a wasm guest under wasmtime:
[1/6] mrustc.wasm: libcore → C (-W exceptions=y, MRUSTC_TARGET_VER=1.74) [2/6] mrustc.wasm: prog.rs → C [3/6] clang.wasm: core.c → core.o [4/6] clang.wasm: prog.c → prog.o [5/6] wasm-ld: crt1.o + objs + -lc + libclang_rt.builtins.a → prog.wasm [6/6] wasmtime run prog.wasm → RUST IN WASM SANDBOX: 55
fib(10) is 55, and the kicker is the memory number: the build peaked around
600 MB — the wasm32 4 GB ceiling everyone worries about never
bit. The live lane runs mrustc under wasmtime with
-W exceptions=y -W max-wasm-stack=134217728 and the env
MRUSTC_TARGET_VER=1.74, STD_ENV_ARCH=wasm32,
TMPDIR=/tmp; its clang pass uses
--target=wasm32-wasip1 -fwasm-exceptions -mllvm -wasm-enable-sjlj.
The caller surface is one function:
Workbooks.Compilers.rust_compile_to_wasm(rs, deps: ["[email protected]"]).
Full std works too — a program printing
HELLO FROM RUST STD IN WASM: 42, and a
sum(1..=10)=55 std proof. The non-obvious part: libstd is
prebuilt BY mrustc.wasm, not by a native rustc — hash and ABI consistency
demand the same compiler build std and user code. If std isn't prebuilt, the
lane returns {:error, {:libstd_not_prebuilt, dir}} rather than
silently shelling out to a native rustc. That refusal is the doctrine in one
return value.
the dependency FRONTIER
Source compiling is table stakes; dependencies are where it gets
real. The Rust lane fetches crates from the crates.io sparse index at
index.crates.io plus the static CDN, and compiles them
transitively — in-sandbox, same chain as user code. Version resolution walks a
fallback: try the exact pin, then a curated floor, then newest-first, capped at
six candidates, with an edition fallback from the declared edition down to
2021. Twenty-three crates are proven as of that same date — including
fnv, byteorder, base64, memchr, ryu, bitflags, num-traits, smallvec,
lazy_static, cfg-if, anyhow, once_cell, log, bytes, and regex-syntax.
The capability map, honestly drawn — what compiles, by what mechanism:
| crate class | works? | mechanism |
|---|---|---|
| pure-source crates (fnv, base64, memchr…) | yes — 23 proven | sparse-index fetch → mrustc → clang, transitively |
declarative-macro crates (macro_rules!) | yes | mrustc expands them natively in-pass |
| build.rs autocfg probes | yes | skipped — the conservative fallback is correct |
| version-pinned deps (regex, syn, serde) | yes, with floors | @version_floors papers over the language ceiling |
proc-macro derives (#[derive(Serialize)]) | bridge built, ceiling-gated | WASM token-server + host-import spawn — see below |
codegen build.rs (include!(env!("OUT_DIR"))) | not yet | frontier — listed, not solved |
The floors and hints are real constants in the lane:
@version_floors pins regex to 1.5.4, syn to 1.0.109, serde to
1.0.156; @feature_hints turns on std and
unicode-perl for regex so the right code path compiles. These exist
to dodge the mrustc language ceiling — roughly Rust 1.74 today (up from
1.54 after the version bump). Newer releases like syn-2 or edition-2024 don't
compile; the floors hold the lane on versions that do.
Proc-macros are the big unlock, and the hardest. mrustc natively
pipe()s and posix_spawn()s a proc-macro executable —
forbidden under WASI. The in-sandbox bridge compiles each proc-macro crate to a
WASM "server" speaking mrustc's own token byte-protocol, patches the spawn into
a host-import, and runs mrustc under Wasmex
(Workbooks.ProcMacroHost.run_mrustc) so the host fields the spawn.
It even spoofs the target spec — reporting os-name = "linux" while
the arch stays wasm32 — so syn's
not(all(wasm32, os in (unknown, wasi))) cfg-guard passes. The
bridge is built; the fight is the same language ceiling that gates everything
else. proc-macro2 compiles and runs in-sandbox; the heavier syn-driven derives
push right up against mrustc's frontier. We'd rather state that plainly than
imply #[derive(Serialize)] is uniformly green end-to-end today.
The remaining frontier is a short, named list — codegen build.rs, proc-macros, the language ceiling, and runtime capabilities like tokio's net and threads (routed to BEAM-mediated Dock imports). The repo's own note on them is the one worth keeping: none are JIT-class walls.
the bootstrap compiler that only speaks C
The Zig lane runs zig 0.16.0's stage1/zig1.wasm — the
bootstrap compiler. It runs the full Zig frontend and Sema and the C-backend
codegen inside wasm, and emits C, which then takes the clang lane. Its
#+KIND is compile-to-c, the second trick.
zig1 is bootstrap-only on purpose: every backend except the C backend is
disabled, and the archiver and linker commands are stubbed out —
build-exe panics on ar_command. It can do exactly one
thing, build-obj -ofmt=c, which is precisely what this lane needs.
Zig resolves every path against a single preopen, so the lane stages the
per-job directory beside lib/ under one zig-root/ and
preopens the whole root. The argv is exact:
zig1 build-obj -ofmt=c -OReleaseSmall --zig-lib-dir lib \
--cache-dir jobs/<id>/zc --global-cache-dir jobs/<id>/gc \
--name out -femit-bin=jobs/<id>/out.c \
-target wasm32-wasi -Mroot=jobs/<id>/src.zig
Two committed bridges close the run step. wasi_shim.c forwards
the bare WASI externs Zig declares — fd_write,
proc_exit — onto wasi-libc's __wasi_* functions; and a
prelude no-ops __builtin_return_address and
__builtin_frame_address. The link runs with crt: false
to avoid a duplicate _start. A std.debug.print program
then runs and prints zig-e2e=42; a hello compile emits about
800 KB of genuine Zig C-backend output, headed by
#include "zig.h". One subtle correctness fix: an empty
out.c is now treated as a failed compile — zig1 can emit a
zero-byte file on error, which used to slip through as success and surface as a
misleading "_start undefined" link error later.
The honest wall: direct .zig→wasm in-sandbox — a self-hosted
wasm backend built to wasi — is blocked upstream. os.realpath isn't
available on WASI (ziglang/zig#20665) and there are open codegen bugs. So the
lane goes through C, and says so.
don't compile it AT ALL
The Go lane takes the fourth trick: interpret-in-sandbox. It
runs yaegi, traefik's pure-Go interpreter, cross-compiled to wasip1 by
the native Go toolchain once — trusted provisioning,
GOOS=wasip1 GOARCH=wasm go build. The native Go toolchain only
ever builds the trusted runner; it never touches user code.
The elegant bit is how user source rides in. The package manager
concatenates the runner with the untrusted Go source embedded in a wasm
custom section named wbgosrc. The built module is literally
these bytes:
yaegi-run.wasm ++ <<0>> ← custom-section id ++ leb128(payload_length) ++ "wbgosrc" ← section name ++ <the Go source> ++ <<byte_size(source)::big-64>> ← length trailer ++ "WBGOSRC1" ← magic
This buys two properties at once. Custom sections are ignored at
execution, so the file is still a perfectly valid, runnable yaegi module —
nothing about the embedding breaks it. And because the source is now part of
the bytes, each program is a unique, content-addressable, self-contained
wasm — the runner extracts the source with a single pread of
the 16-byte trailer (<length::big-64> followed by
WBGOSRC1), no side files. The limit is verbatim from the manifest:
stdlib only — no external module deps — single main package. Multi-file
packages get merged into the one source yaegi evaluates.
embed the interpreter PER program
The JS lane is the one that bootstraps itself. QuickJS-ng v0.10.0 is
compiled to wasm objects by the clang lane —
js/build.sh compiles quickjs, cutils, libregexp, libunicode and
xsum with llvm.core.wasm under wasmtime. The sandbox builds its own
JS interpreter.
Then each program embeds it. The user's JS bytes become a C byte array, and the whole thing compiles and links into a standalone wasm per program:
const char wb_js_src[] = {104,101,108,108,111, … ,0};
const unsigned wb_js_len = N;
// link line shape:
wasm-ld crt1-command.o harness.o js_src.o \
quickjs.o cutils.o libregexp.o libunicode.o xsum.o \
-lc … -o out.wasm
The bridge is harness.c, which provides the Javy contract:
Javy.IO.readSync(fd, u8) and writeSync(fd, u8) bound
directly onto wasi-libc read() and write(), plus
console.log and a hand-rolled TextEncoder/TextDecoder JS
prelude — because quickjs-ng ships neither. It deliberately does not link
quickjs-libc, whose POSIX bits (environ, signals, popen) don't exist under
WASI. No JIT, no native javy.
TypeScript reuses this whole stack: the real tsc
(typescript.js) runs inside QuickJS
(qjs-run.wasm /w/tsjob.js), takes TS on stdin and type-strips via
ts.transpileModule to JS on stdout, which then takes the JS lane.
And the speed valve. QuickJS is an interpreter, so bundling many files
through it is slow. The bundler routes esbuild-first: esbuild compiled to
wasip1 runs under wasmtime, which JITs it to native. A multi-file
JS/TS/JSX bundle that takes about 23 minutes interpreted in QuickJS runs in
about 160 ms this way. It falls back to the QuickJS bundler only
when a Node core module needs the Dock shims. (For host capabilities, a
harness_dock.o variant links env.* imports and must
run under Workbooks.JsDock, not the bare wasmtime
CLI.)
the honest LEDGER
The doctrine is stated in the repo and worth quoting: done equals an honest answer per language — each lane ships either a working in-sandbox compiler or a committed honest-blocker note. No stubs. Here is the ledger:
| lane | status | hard limit | tracking |
|---|---|---|---|
| C (clang.wasm) | in-sandbox | compile/link split — no fork under WASI | fm0.1 |
| Zig (zig1→C) | in-sandbox | C backend only; direct→wasm blocked (zig#20665) | fm0.2 |
| Rust (mrustc→C) | in-sandbox | ~1.74 language ceiling; proc-macros gated | fm0.3 |
| JS (QuickJS) | in-sandbox | interpreter — esbuild is the speed path | fm0.4 |
| Go (yaegi) | in-sandbox | stdlib only; single main package | fm0.5 |
| TS (tsc→JS) | in-sandbox | type-strip only — rides the JS lane | fm0.6 |
| jco (typed WIT) | native, by design | runs JS under Node+wizer at build — can't be a guest | fm0.7 |
jco is the one acknowledged native holdout — and it's outside the untrusted
path on purpose. It executes JS under Node and wizer at build time to make typed
WIT components, which are optional; the core dataflow uses the six in-sandbox
lanes. The same is true of wac: it stays native because tokio won't
target wasi, but it only composes already-built, validated trusted
components — byte manipulation, no untrusted execution. (wasm-tools,
for the record, does run in-sandbox.)
The most important honesty isn't a footnote — it's a return value. The old
native build isolator Workbooks.Sandbox was deleted: with
every untrusted compile in-wasm, there's no native compile left to isolate. The
wasm sandbox is the boundary now. And the native fallbacks were ripped out, not
hidden — ask for a native Rust crate build and you get
{:error, {:lane_unavailable, :rust_crate_native_build}}; native Go,
:go_package_native_build; native Zig,
:zig_native_build. The lane fails loudly rather than quietly
shelling out. That's the whole posture: a refusal you can see beats a fallback
you can't.
questions people actually ASK
Can I use serde and #[derive(Serialize)]?
Pure-source crates: yes — 23 are proven. serde with a version floor
compiles. The #[derive] macros are the frontier: the bridge that
runs proc-macros in-sandbox is built (a WASM token-server plus a host-import
spawn under Wasmex, with a spoofed target spec so syn's cfg-guard passes), but
it's gated on the same mrustc ~1.74 language ceiling. proc-macro2 runs
in-sandbox today; heavier syn-driven derives push right against that ceiling.
Honest answer: lean on declarative macros where you can, and treat derives as
in-progress rather than uniformly green.
Why is my JS bundle fast — or slow?
Routing. A multi-file bundle goes through esbuild compiled to wasip1, which wasmtime JITs to native — about 160 ms where the QuickJS interpreter would take ~23 minutes. You only fall back to the slow QuickJS bundler when a Node core module needs the Dock shims. A single program embeds QuickJS directly and runs interpreted, which is plenty fast for one file.
Can a malicious source escape during the compile?
The compiler is just another sandboxed program. It runs under wasmtime with
a fixed preopen set — sysroot at /usr, the job dir at
/work, /tmp writable — bounded by fuel and a wall
clock, and the job dir is deleted after. There's no native build to attack,
because there is no native build. The wasm sandbox is the boundary, not a
layer on top of one.
Why not just bwrap the real rustc?
Because the repo's design doc calls that out as insufficient: a native compile under bwrap or seatbelt is not a sufficient boundary for adversarial untrusted source on a shared host. OS sandboxes are defense-in-depth, not isolation — which is the reason microVMs exist. Running the compiler as a wasm guest moves the boundary to where it actually holds.
Where do the compiler wasms come from?
Trusted provisioning, once. The compiler binaries are built by native
toolchains — build.sh, provision-rust.sh — but those
build only the trusted tools, never user code. clang is sha-pinned from npm;
mrustc is built to wasm32-wasi; yaegi is cross-compiled by native Go. The
derived artifacts are gitignored on purpose, and DeployKit regenerates them.
Why does everything go through C?
Because the only honest way to emit wasm for a compiled language inside the sandbox is LLVM, and the only LLVM that runs as a wasm guest is the clang multitool. So Rust and Zig transpile to C and hand off; even the JS lane compiles an embedded interpreter through clang. C is the bottleneck language of the sandbox — not by accident, by the shape of the available tools.
keep GOING
This deep dive sits under the toolkit lesson — start there for the bet in two sentences, then follow the funnel that decides what reaches a lane.