waves — how a DAG runs in parallel

your DAG runs SINGLE-FILE

The parent lesson showed a board clearing in three rounds and an org file compiling into a world. But "rounds" sounds sequential, and every orchestrator you've ever used taught you that parallelism is something you configure. Airflow has executors and pools. GitHub Actions has runners and concurrency groups. Step Functions has Map states. The matrix is YAML you write by hand. So a fair suspicion walks in with you: "the plan executes" probably means a loop runs the tasks one at a time.

It doesn't. And the reason is the most satisfying kind — there's no trick, just a definition you already had the pieces for. The dependency edges you wrote (one component's :out feeding another's :in) are already the concurrency declaration. Nothing else is needed, because nothing else is true: two steps that don't depend on each other can run together, and the file already says which steps those are.

the DEFINITION

wave /weɪv/ noun

1. the set of steps whose predecessors are all done, executed simultaneously. A compiled world clears in waves, not rounds: wave N is everything that became ready once wave N−1 finished.

The word matters because "round" implies a turn taken by one actor, and a wave is the opposite — a whole front advancing together. The executor that does this lives in two functions, about seventy lines of Elixir total. There is no queue, no broker, no worker pool, no scheduler config. The plan file is the spec, and the engine reads it.

computing the WAVES

Here is the entire scheduler, in English. Take the world's edges and group them by consumer, so each step knows its predecessors. Then peel: a wave is every remaining step whose predecessors are all in the done set. Run that wave, add its steps to done, and peel again. When nothing remains, stop. That's waves/2 — a recursion that reverses out the layers.

Take a five-step world. fetch and seed depend on nothing. clean consumes fetch; score consumes both clean and seed; report consumes score. Group the edges, peel the ready sets, and the graph falls into three bands — colored here by the wave they run in:

flowchart LR
  fetch["fetch"]
  seed["seed"]
  clean["clean"]
  score["score"]
  report["report"]
  fetch --> clean
  clean --> score
  seed --> score
  score --> report
  style fetch fill:#13d943,stroke:#121316,stroke-width:2.5px
  style seed fill:#13d943,stroke:#121316,stroke-width:2.5px
  style clean fill:#a8d4f0,stroke:#121316
  style score fill:#f3c5a3,stroke:#121316
  style report fill:#f2ddb0,stroke:#121316

Read the colors as time. Wave one is the green pair — fetch and seed have no predecessors, so they go at once. Wave two is clean alone: fetch is done, so it's ready, but score can't move because seed and clean aren't both finished yet. Wave three is score — now both its predecessors are done. Wave four is report. Nobody wrote that order; it fell out of the edges. And notice seed: it had a free seat in wave one even though score wouldn't need it until wave three. The wave engine spends idle capacity the instant a step is eligible, not the instant it's strictly required.

the accumulator is the PIPE

Within a wave, steps run. Between waves, data flows — and the mechanism is almost too plain to call a mechanism. The executor keeps one map: name → output. Each step's result lands in it under the step's name. When the next step runs, it looks up its single producer in that map and receives that string as its standard input. The accumulator map is the pipe.

Concretely: build a producer index — for every edge, remember consumer → producer. When a step runs, its input is acc[producer] if it has an inbound edge, or the workflow's original input if it's a root. The producer's stdout (trimmed) becomes the consumer's stdin. Picture a two-step world — Upper uppercases its input, Count reports the length of Upper's result:

sequenceDiagram
  participant I as input "hello world"
  participant U as Upper (wave 1)
  participant A as acc — the map
  participant C as Count (wave 2)
  I->>U: root step — gets the workflow input
  U->>A: acc["Upper"] = {upper: HELLO WORLD}
  A->>C: in_data = acc["Upper"]
  C->>A: acc["Count"] = {len: 11}
  Note over A: Upper's stdout WAS Count's stdin

Walk it: the input hello world reaches Upper because Upper is a root — it has no inbound edge, so it gets the workflow's original input. Upper writes {upper: HELLO WORLD} into the map under its own name. Count has one inbound edge from Upper, so its input is exactly that string — eleven characters once uppercased — and it writes {len: 11}. Two steps, two waves, one pipe. Any :out that no step consumes becomes one of the world's exports — the result you read off the top.

One sharp limit lives here, and it's load-bearing for the honesty section: a component has at most one inbound edge. The kernel gives each component a single optional :in, so fan-in isn't expressible — a step can't read two producers at once. Fan-out is free, though: many components can name the same producer's :out, and every one of them receives that single output.

eight at a TIME

Depth rung — skippable, but it's where the budgets live. A wave runs through Task.async_stream with max_concurrency: 8 and a per-slot timeout of ten minutes. So at most eight steps of a wave run truly in parallel; a wider wave queues the rest behind those eight. The ten-minute slot is generous on purpose — it has to cover an agent step thinking, or a component compiling in-sandbox for the first time.

But each individual WASM step lives under a much tighter budget than the wave that holds it. The two budgets answer different questions, and it's worth seeing them side by side — the wave timeout protects the whole orchestration from a hung step; the per-step caps protect the engine from a single runaway component:

limit	value	what it protects
wave concurrency	8 at once	the shared engine — a wide wave can't stampede the box
wave-slot timeout	600,000 ms (10 min)	the orchestration — covers agent thinking + first compile
WASM step timeout	30,000 ms	one component — a tight filter can't spin forever
WASM step fuel	5,000,000,000	one component — bounds work even under the time cap
stdin cap	64 MiB	memory — an oversized pipe is rejected, not OOM'd
argv cap	256 KiB	the launch path — header args stay bounded

The verdict of that table in one line: a compiled filter effectively gets thirty seconds and five billion fuel units, while the wave slot holding it gets ten minutes — because the slot might instead be holding an agent, or a step that's compiling itself from source for the very first time. The numbers are constants in the engine, not knobs you set per workflow. That's a real limitation, and it's in the honesty section too.

agents in the DAG

Here is the second aha, and it costs the engine nothing. A step's language is usually a compiler target — JavaScript, Rust, Zig. But one language is special: agent. When a component's source block is tagged agent, the source block is the system prompt and the piped input is the task. The step runs a full agent — model, tools, the loop — with max_steps: 6, and its final answer is the step's output, slotted into the same accumulator map as any compiled filter.

The wave engine doesn't know agents exist. It takes a step_fn — inject the pure-WASM runner and you get a compiler pipeline; inject the agent-aware one and the same scheduler fans out language models. So "run three sub-agents in parallel" needs no new construct. It's three agent components with no edges between them: no edges means no predecessors, which means one wave, which means simultaneous. Here are two analysts on the same topic, edge-free:

flowchart TD
  topic["topic: the safety of modern nuclear power"]
  topic --> fact["Fact :component: — agent
one key fact"]
  topic --> risk["Risk :component: — agent
one key risk"]
  style fact fill:#13d943,stroke:#121316,stroke-width:2.5px
  style risk fill:#13d943,stroke:#121316,stroke-width:2.5px
  style topic fill:#fbfaf6,stroke:#121316

Both nodes are green — same wave. Fact and Risk each get the topic as their task, run their own agent loop at the same time under the eight-wide limit, and land as two entries in the results. This is the brandnana-style sub-agent fan-out, expressed as two headlines in a plan file. The org for it is just two agent components and nothing joining them:

* Research fan-out                              :workflow:
** Fact                                         :component:
   #+begin_src agent
   You are a concise analyst. Your task is a topic.
   Reply with ONE key fact about it.
   #+end_src
** Risk                                         :component:
   #+begin_src agent
   You are a concise risk analyst. Your task is a topic.
   Reply with ONE key risk about it.
   #+end_src

An agent step gets six steps of its own loop, where a standalone agent gets twelve — a workflow fans out many, so each is kept lean. And an agent in a wave has the same tools any agent has: an in-WASM shell, the virtual filesystem, the wb CLI, the ability to file an issue. No native exec — the real-bash hatch was deleted. The agent is a step, not an escape.

why wave two is INSTANT

Depth rung. Run the same world twice and the second run is dramatically faster, for two reasons that sit underneath the wave engine. First, the build cache is content-addressed: a component's source hashes to a filename, and identical source returns the already-built .wasm tagged :cached — no recompile. Second, the wasmtime compilation cache remembers the machine code wasmtime generated from that module, so a cached run skips the JIT entirely.

That second cache has a war story. Without it, every run JIT-compiled the module from scratch — which on a throttled shared vCPU took minutes. The symptom looked like "the wasm shell hangs"; the cause was recompilation, every single time. Turning the cache on was the fix. The difference shows up starkly between a cold first run and a warm repeat:

	build (source → wasm)	run (wasm → output)
first run	full compile in-sandbox	full JIT from scratch
repeat run	cache hit — `:cached`	wasmtime cache — no JIT

The verdict: a wave's first pass pays for compilation; every pass after pays almost nothing. This is why the ten-minute slot exists at all — it's sized for that expensive first compile, not for steady state. In steady state the same wave is gone in a blink.

what comes BACK

A run returns a record, not a log. For each workflow it's a map: the workflow name, its schedule, the world's exports, a tasks map of name → output, and a list of sub_workflows. Nested :workflow: headlines recurse — each sub-world runs its own waves and produces its own record. Here is a real Pipeline world's result, abridged:

{
  "workflow": "Pipeline",
  "schedule": { "...": "..." },
  "exports": ["..."],
  "tasks": {
    "Upper": "{\"upper\":\"HELLO WORLD\"}",
    "Count": "{\"len\":11}"
  },
  "sub_workflows": [
    { "workflow": "Audit",
      "tasks": { "Echo": "audited:hello world" } }
  ]
}

Two facts in that record earn their place. Upper's stdout literally was Count's stdin — the pipe, visible in the output. And the sub-workflow Audit ran its own Echo step on audited:hello world — which proves a sharp boundary: sub-workflows receive the original top-level input, not the parent's outputs. Edges never cross a workflow boundary. A sub-world is a fresh run with the same input, not a downstream consumer of the parent.

You reach all of this over HTTP. POST /api/workflow with an org string and an input returns the records; add ?plan=1 and you get the schedules and task names without executing — a dry run of the plan. The curl is short:

curl -s -X POST https://<engine>/api/workflow \
  -d '{"org": "<the Pipeline org>", "input": "hello world"}'

# dry run — schedules + task names only, nothing executes:
curl -s -X POST 'https://<engine>/api/workflow?plan=1' \
  -d '{"org": "..."}'

where it BITES

Honesty section. The wave model is small and sharp, and sharp edges cut.

No fan-in. A component has one optional :in, so a step can't read two producers. If C needs both A and B, you can't wire it directly — you add a downstream merge step, or you wait for a typed composition path to make it expressible. Fan-out is free; fan-in is a feature that isn't here yet.

Errors flow as values. A failed step doesn't halt the wave — its task value becomes an error tuple, and the run completes. That's a feature until it isn't: the failure is data you can read in the result. But a consumer downstream of a failed producer receives that error tuple as its input, which is non-binary, which the WASM lane rejects — and today it rejects it with a misleading input_too_large label even when the real cause is the poisoned upstream. The error propagates; the label lies. We'd rather you knew.

Cycles aren't validated. Validation checks two things — a component with no source, and an :in with no upstream producer. It does not check for cycles. When the executor can't find any ready step but steps remain, it lumps all the survivors into one final wave and runs them at once with nil input — which the WASM lane rejects as non-binary. So a cycle doesn't deadlock or hang; it degenerates into a final all-at-once wave that fails at the input guard. No hang, but a confusing failure rather than a clean "you have a cycle."

The constants are constants. Eight-wide concurrency, the ten-minute slot, the thirty-second step timeout — none of these are per-workflow knobs today. And sub-workflows getting the original input (not the parent's outputs) is a real constraint, not a bug: it means you compose across worlds by other means, not by piping a parent's results into a child.

questions people actually ASK

How do I run two agents in parallel?

Omit the edges. Two agent components with no :in pointing at each other have no predecessors, so they land in the same wave and run simultaneously under the eight-wide limit. Each gets the workflow input as its task; each lands as an entry in tasks. Parallelism is the absence of a dependency, not a flag.

What if a step hangs?

It's caught by two nets. A WASM step has a thirty-second timeout and a five-billion fuel cap, so a spinning component is killed. The wave slot has a ten-minute ceiling above that. Either way the step's value becomes an error tuple, the run completes, and the failure is data in the result — not a hung process you have to go find.

Can step C read both A and B?

Not directly — a component has exactly one inbound edge. To merge two producers, add a downstream step that reads one of them and reaches the other another way, or restructure so a single producer carries the merged payload. Fan-in is a known gap; fan-out (many consumers, one producer) is free.

Is this just Airflow?

No broker, no worker pool, no executor config. The plan file is the spec, and about seventy lines of Elixir partition it into waves and run them. Airflow makes parallelism something you provision; here it's computed from the edges you already wrote. The whole orchestration module is fifty lines — there isn't room for an Airflow in it.

Does the schedule fire the workflow on its own?

A world's schedule is surfaced in the plan and the run record — you can read "every day at six" off it. But autonomous firing is the engine's keeper tier, not a cron loop hidden in the wave engine. Treat the schedule as declared intent that a keeper acts on, not as a guarantee that the wave engine is watching the clock.

What happens to an unconsumed output?

It becomes an export. Any component :out that no other component names as its :in is collected into the world's exports — the results you read off the top of the run. The upgrade gate protects those: exports may grow but never shrink. The wave contract is exactly what that gate guards.

keep GOING

Waves are how a compiled world runs. The parent has the plan; the siblings have the world, the nesting, and the worker.

Workflowsrounds → waves, the explicit handoff

→ ◴

Worldswhat tangle_plan emits for waves to run

→

Nestingsub-worlds, imports, the upgrade gate

→

Agentsthe loop an agent step invokes

→