your DAG runs SINGLE-FILE
The parent lesson showed a board clearing in three rounds and an org file compiling into a world. But "rounds" sounds sequential, and every orchestrator you've ever used taught you that parallelism is something you configure. Airflow has executors and pools. GitHub Actions has runners and concurrency groups. Step Functions has Map states. The matrix is YAML you write by hand. So a fair suspicion walks in with you: "the plan executes" probably means a loop runs the tasks one at a time.
It doesn't. And the reason is the most satisfying kind — there's no
trick, just a definition you already had the pieces for. The dependency
edges you wrote (one component's :out feeding another's
:in) are already the concurrency declaration. Nothing
else is needed, because nothing else is true: two steps that don't depend
on each other can run together, and the file already says which steps
those are.
the DEFINITION
1. the set of steps whose predecessors are all done, executed simultaneously. A compiled world clears in waves, not rounds: wave N is everything that became ready once wave N−1 finished.
The word matters because "round" implies a turn taken by one actor, and a wave is the opposite — a whole front advancing together. The executor that does this lives in two functions, about seventy lines of Elixir total. There is no queue, no broker, no worker pool, no scheduler config. The plan file is the spec, and the engine reads it.
computing the WAVES
Here is the entire scheduler, in English. Take the world's edges and
group them by consumer, so each step knows its predecessors. Then peel:
a wave is every remaining step whose predecessors are all in the
done set. Run that wave, add its steps to done,
and peel again. When nothing remains, stop. That's waves/2 —
a recursion that reverses out the layers.
Take a five-step world. fetch and seed depend on nothing. clean consumes fetch; score consumes both clean and seed; report consumes score. Group the edges, peel the ready sets, and the graph falls into three bands — colored here by the wave they run in:
flowchart LR fetch["fetch"] seed["seed"] clean["clean"] score["score"] report["report"] fetch --> clean clean --> score seed --> score score --> report style fetch fill:#13d943,stroke:#121316,stroke-width:2.5px style seed fill:#13d943,stroke:#121316,stroke-width:2.5px style clean fill:#a8d4f0,stroke:#121316 style score fill:#f3c5a3,stroke:#121316 style report fill:#f2ddb0,stroke:#121316
Read the colors as time. Wave one is the green pair — fetch and seed have no predecessors, so they go at once. Wave two is clean alone: fetch is done, so it's ready, but score can't move because seed and clean aren't both finished yet. Wave three is score — now both its predecessors are done. Wave four is report. Nobody wrote that order; it fell out of the edges. And notice seed: it had a free seat in wave one even though score wouldn't need it until wave three. The wave engine spends idle capacity the instant a step is eligible, not the instant it's strictly required.
the accumulator is the PIPE
Within a wave, steps run. Between waves, data flows — and the mechanism
is almost too plain to call a mechanism. The executor keeps one map:
name → output. Each step's result lands in it under the step's
name. When the next step runs, it looks up its single producer in that map
and receives that string as its standard input. The accumulator map
is the pipe.
Concretely: build a producer index — for every edge, remember
consumer → producer. When a step runs, its input is
acc[producer] if it has an inbound edge, or the workflow's
original input if it's a root. The producer's stdout (trimmed) becomes the
consumer's stdin. Picture a two-step world — Upper uppercases its
input, Count reports the length of Upper's result:
sequenceDiagram
participant I as input "hello world"
participant U as Upper (wave 1)
participant A as acc — the map
participant C as Count (wave 2)
I->>U: root step — gets the workflow input
U->>A: acc["Upper"] = {upper: HELLO WORLD}
A->>C: in_data = acc["Upper"]
C->>A: acc["Count"] = {len: 11}
Note over A: Upper's stdout WAS Count's stdin
Walk it: the input hello world reaches Upper because Upper
is a root — it has no inbound edge, so it gets the workflow's original
input. Upper writes {upper: HELLO WORLD} into the map under
its own name. Count has one inbound edge from Upper, so its input is
exactly that string — eleven characters once uppercased — and it writes
{len: 11}. Two steps, two waves, one pipe. Any
:out that no step consumes becomes one of the world's
exports — the result you read off the top.
One sharp limit lives here, and it's load-bearing for the honesty
section: a component has at most one inbound edge. The kernel gives
each component a single optional :in, so fan-in isn't
expressible — a step can't read two producers at once. Fan-out is
free, though: many components can name the same producer's
:out, and every one of them receives that single output.
eight at a TIME
Depth rung — skippable, but it's where the budgets live. A wave runs
through Task.async_stream with max_concurrency: 8
and a per-slot timeout of ten minutes. So at most eight steps
of a wave run truly in parallel; a wider wave queues the rest behind those
eight. The ten-minute slot is generous on purpose — it has to cover an
agent step thinking, or a component compiling in-sandbox for the first
time.
But each individual WASM step lives under a much tighter budget than the wave that holds it. The two budgets answer different questions, and it's worth seeing them side by side — the wave timeout protects the whole orchestration from a hung step; the per-step caps protect the engine from a single runaway component:
| limit | value | what it protects |
|---|---|---|
| wave concurrency | 8 at once | the shared engine — a wide wave can't stampede the box |
| wave-slot timeout | 600,000 ms (10 min) | the orchestration — covers agent thinking + first compile |
| WASM step timeout | 30,000 ms | one component — a tight filter can't spin forever |
| WASM step fuel | 5,000,000,000 | one component — bounds work even under the time cap |
| stdin cap | 64 MiB | memory — an oversized pipe is rejected, not OOM'd |
| argv cap | 256 KiB | the launch path — header args stay bounded |
The verdict of that table in one line: a compiled filter effectively gets thirty seconds and five billion fuel units, while the wave slot holding it gets ten minutes — because the slot might instead be holding an agent, or a step that's compiling itself from source for the very first time. The numbers are constants in the engine, not knobs you set per workflow. That's a real limitation, and it's in the honesty section too.
agents in the DAG
Here is the second aha, and it costs the engine nothing. A step's
language is usually a compiler target — JavaScript, Rust, Zig. But one
language is special: agent. When a component's source block is
tagged agent, the source block is the system prompt and
the piped input is the task. The step runs a full
agent — model, tools, the loop — with
max_steps: 6, and its final answer is the step's output, slotted
into the same accumulator map as any compiled filter.
The wave engine doesn't know agents exist. It takes a step_fn
— inject the pure-WASM runner and you get a compiler pipeline; inject the
agent-aware one and the same scheduler fans out language models. So "run
three sub-agents in parallel" needs no new construct. It's three
agent components with no edges between them: no edges
means no predecessors, which means one wave, which means simultaneous.
Here are two analysts on the same topic, edge-free:
flowchart TD topic["topic: the safety of modern nuclear power"] topic --> fact["Fact :component: — agent
one key fact"] topic --> risk["Risk :component: — agent
one key risk"] style fact fill:#13d943,stroke:#121316,stroke-width:2.5px style risk fill:#13d943,stroke:#121316,stroke-width:2.5px style topic fill:#fbfaf6,stroke:#121316
Both nodes are green — same wave. Fact and Risk each get
the topic as their task, run their own agent loop at the same time under
the eight-wide limit, and land as two entries in the results. This is the
brandnana-style sub-agent fan-out, expressed as two headlines in a plan
file. The org for it is just two agent components and nothing
joining them:
* Research fan-out :workflow: ** Fact :component: #+begin_src agent You are a concise analyst. Your task is a topic. Reply with ONE key fact about it. #+end_src ** Risk :component: #+begin_src agent You are a concise risk analyst. Your task is a topic. Reply with ONE key risk about it. #+end_src
An agent step gets six steps of its own loop, where a standalone agent
gets twelve — a workflow fans out many, so each is kept lean. And an agent
in a wave has the same tools any agent has: an in-WASM shell, the virtual
filesystem, the wb CLI, the ability to file an issue. No native
exec — the real-bash hatch was deleted. The agent is a step, not an
escape.
why wave two is INSTANT
Depth rung. Run the same world twice and the second run is dramatically
faster, for two reasons that sit underneath the wave engine. First, the
build cache is content-addressed: a component's source hashes to a
filename, and identical source returns the already-built
.wasm tagged :cached — no recompile. Second, the
wasmtime compilation cache remembers the machine code wasmtime
generated from that module, so a cached run skips the JIT entirely.
That second cache has a war story. Without it, every run JIT-compiled the module from scratch — which on a throttled shared vCPU took minutes. The symptom looked like "the wasm shell hangs"; the cause was recompilation, every single time. Turning the cache on was the fix. The difference shows up starkly between a cold first run and a warm repeat:
| build (source → wasm) | run (wasm → output) | |
|---|---|---|
| first run | full compile in-sandbox | full JIT from scratch |
| repeat run | cache hit — :cached | wasmtime cache — no JIT |
The verdict: a wave's first pass pays for compilation; every pass after pays almost nothing. This is why the ten-minute slot exists at all — it's sized for that expensive first compile, not for steady state. In steady state the same wave is gone in a blink.
what comes BACK
A run returns a record, not a log. For each workflow it's a map: the
workflow name, its schedule, the world's exports, a tasks map
of name → output, and a list of sub_workflows.
Nested :workflow: headlines recurse — each sub-world runs its
own waves and produces its own record. Here is a real Pipeline world's
result, abridged:
{
"workflow": "Pipeline",
"schedule": { "...": "..." },
"exports": ["..."],
"tasks": {
"Upper": "{\"upper\":\"HELLO WORLD\"}",
"Count": "{\"len\":11}"
},
"sub_workflows": [
{ "workflow": "Audit",
"tasks": { "Echo": "audited:hello world" } }
]
}
Two facts in that record earn their place. Upper's stdout
literally was Count's stdin — the pipe, visible in the
output. And the sub-workflow Audit ran its own Echo
step on audited:hello world — which proves a sharp boundary:
sub-workflows receive the original top-level input, not the parent's
outputs. Edges never cross a workflow boundary. A sub-world is a fresh
run with the same input, not a downstream consumer of the parent.
You reach all of this over HTTP. POST /api/workflow with an
org string and an input returns the records; add ?plan=1 and
you get the schedules and task names without executing — a dry run
of the plan. The curl is short:
curl -s -X POST https://<engine>/api/workflow \
-d '{"org": "<the Pipeline org>", "input": "hello world"}'
# dry run — schedules + task names only, nothing executes:
curl -s -X POST 'https://<engine>/api/workflow?plan=1' \
-d '{"org": "..."}'
where it BITES
Honesty section. The wave model is small and sharp, and sharp edges cut.
No fan-in. A component has one optional :in, so a step
can't read two producers. If C needs both A and B, you can't wire it
directly — you add a downstream merge step, or you wait for a typed
composition path to make it expressible. Fan-out is free; fan-in is a
feature that isn't here yet.
Errors flow as values. A failed step doesn't halt the wave — its
task value becomes an error tuple, and the run completes. That's a feature
until it isn't: the failure is data you can read in the result. But a
consumer downstream of a failed producer receives that error tuple as its
input, which is non-binary, which the WASM lane rejects — and today it
rejects it with a misleading input_too_large label even when the
real cause is the poisoned upstream. The error propagates; the label lies.
We'd rather you knew.
Cycles aren't validated. Validation checks two things — a
component with no source, and an :in with no upstream producer.
It does not check for cycles. When the executor can't find any
ready step but steps remain, it lumps all the survivors into one final
wave and runs them at once with nil input — which the WASM lane
rejects as non-binary. So a cycle doesn't deadlock or hang; it degenerates
into a final all-at-once wave that fails at the input guard. No hang, but a
confusing failure rather than a clean "you have a cycle."
The constants are constants. Eight-wide concurrency, the ten-minute slot, the thirty-second step timeout — none of these are per-workflow knobs today. And sub-workflows getting the original input (not the parent's outputs) is a real constraint, not a bug: it means you compose across worlds by other means, not by piping a parent's results into a child.
questions people actually ASK
How do I run two agents in parallel?
Omit the edges. Two agent components with no
:in pointing at each other have no predecessors, so they land
in the same wave and run simultaneously under the eight-wide limit. Each
gets the workflow input as its task; each lands as an entry in
tasks. Parallelism is the absence of a dependency,
not a flag.
What if a step hangs?
It's caught by two nets. A WASM step has a thirty-second timeout and a five-billion fuel cap, so a spinning component is killed. The wave slot has a ten-minute ceiling above that. Either way the step's value becomes an error tuple, the run completes, and the failure is data in the result — not a hung process you have to go find.
Can step C read both A and B?
Not directly — a component has exactly one inbound edge. To merge two producers, add a downstream step that reads one of them and reaches the other another way, or restructure so a single producer carries the merged payload. Fan-in is a known gap; fan-out (many consumers, one producer) is free.
Is this just Airflow?
No broker, no worker pool, no executor config. The plan file is the spec, and about seventy lines of Elixir partition it into waves and run them. Airflow makes parallelism something you provision; here it's computed from the edges you already wrote. The whole orchestration module is fifty lines — there isn't room for an Airflow in it.
Does the schedule fire the workflow on its own?
A world's schedule is surfaced in the plan and the run record — you can read "every day at six" off it. But autonomous firing is the engine's keeper tier, not a cron loop hidden in the wave engine. Treat the schedule as declared intent that a keeper acts on, not as a guarantee that the wave engine is watching the clock.
What happens to an unconsumed output?
It becomes an export. Any component :out that no
other component names as its :in is collected into the world's
exports — the results you read off the top of the run. The upgrade gate
protects those: exports may grow but never shrink. The wave contract is
exactly what that gate guards.
keep GOING
Waves are how a compiled world runs. The parent has the plan; the siblings have the world, the nesting, and the worker.