the tool that hangs in CI
Everyone has been burned by the same command-line tool. It runs fine on
your laptop. You wire it into a CI job, and the build sits there for twenty
minutes — silently, at a y/n prompt nobody will ever answer.
You fix that, and now your logs are full of ANSI escape codes, because the
tool tried to paint you a color table the log viewer can't read. And when
something finally fails, it exits 1 — for everything, always —
with a paragraph of prose you have to grep to find out
why.
The industry's usual answer is to ship two things: a CLI for humans and an SDK or HTTP API for machines. They start aligned and drift apart — the flag the CLI grew last week isn't in the SDK, the error the SDK returns isn't the one the CLI prints. An ecosystem with one tool — one binary on the laptop, in CI, and inside the engine — can't take that fork. It needs the single command to be safe for an agent and pleasant for a person without becoming two programs that disagree.
wbx solves this with manners, not forks. The work is one code path; the
audience is decided once, at the very end. The SPEC is candid about where
this started: before the mode model, wbx had zero interactivity — accidentally
agent-safe, deliberately human-poor; errors were plain strings with exit code
1 for everything; no --json anywhere; no color. The
fix wasn't a rewrite. It was a seam.
the DEFINITION
1. the manner wbx adopts for its audience — one of Human, Agent, or Json — chosen once per run. Three manners, one code path: the command does identical work and returns the same value; the mode decides only how that value is spoken.
It's a single enum — Mode { Human, Agent, Json } — and the
SPEC's framing for it is exactly four ideas long: one switch, three ways
to set it, sensible default. The guiding rule underneath is that human
mode should be dumb-simple — the tool does the work, and a learning curve is
treated as a bug. The same instinct, pointed the other way, is what makes
the machine modes trustworthy.
eleven lines that DECIDE
The whole human/agent split begins in one function, detect,
and the order of its checks is the entire contract. Read it top to bottom —
the first match wins:
flowchart TB
start(["wbx verb runs"]) --> j{"--json passed?"}
j -- yes --> J["Json — the envelope"]
j -- no --> a{"--agent, or
WBX_AGENT=1 ?"}
a -- yes --> A["Agent — plain text, no prompts"]
a -- no --> t{"stdout is a terminal?
is_terminal()"}
t -- yes --> H["Human — landing, color, prompts"]
t -- no --> A2["Agent — the safe default"]
style J fill:#a8d4f0,stroke:#121316
style A fill:#d9dbd3,stroke:#121316
style A2 fill:#d9dbd3,stroke:#121316
style H fill:#aee5c2,stroke:#121316
style start fill:#ffffff,stroke:#121316
The precedence is --json first, then --agent
or WBX_AGENT=1, then the terminal test, and if none of those
fire — piped, redirected, no TTY — the fallback is Agent. That last
choice is the important one. When wbx can't tell who's listening, it does
not assume a human is there to answer a prompt. The safe failure for
an unknown caller is silence and plain text, never a hang. A pipe is
agent-by-default precisely because a pipe might be a CI runner with no
keyboard attached.
The terminal test is std::io::IsTerminal — standard library,
no crate, one call on stdout(). The two flags are global clap
arguments: --agent forces agent mode (never prompts, no ANSI),
and --json is agent mode plus a single JSON envelope on
stdout. And one honest nit worth stating plainly: WBX_AGENT must
be exactly the string 1. WBX_AGENT=true does nothing
— the check is a literal v == "1". Set it to 1 or
don't set it.
One verb, three manners, live:
$ wbx status # TTY → human: landing + a start-here verb menu
$ wbx status | cat # piped → agent: structured body, no menu, no color
$ WBX_AGENT=1 wbx status # agent manners ON a terminal
$ wbx --json status # the envelope: {"ok":true,"verb":"status","data":{…}}
The agent landing has a small extra kindness: it splices the whole verb tree into the doctor body. One call, and an agent knows both the system's health and its entire surface.
one channel, rendered ONCE
Detection is cheap because of a discipline upstream of it: every command
in wbx returns a single String, and main is the
only thing that renders. The shape of main is small enough to
hold in your head:
let m = mode::detect(cli.json, cli.agent);
let verb = mode::verb_path();
match run(cli, m == Mode::Human) {
Ok(out) => render_ok(m, &verb, &out),
Err(e) => exit(render_err(m, &verb, &e)),
}
That's the seam. The SPEC calls it one output channel — good, and credits it with making the entire mode model a mechanical refactor rather than a rewrite. Because the verb already produced one string, adding three manners meant adding one renderer at the exit, not editing every command.
sequenceDiagram participant U as you / agent participant C as clap (parse + verb) participant R as run() — one verb participant M as mode::render U->>C: wbx toolkit audit --json C->>R: dispatch R-->>M: Ok(String) / Err(anyhow) Note over M: detect() chose the mode
render_ok / render_err runs ONCE M-->>U: stdout (text or envelope) M-->>U: stderr (hint, if human) M-->>U: exit code
The story of that diagram: you or an agent type one command; clap parses
it and picks the verb; run() does the real work and hands back
either an Ok(String) or an error; then — and only then — the
mode layer renders that result exactly once, sending output to stdout, any
hint to stderr, and setting the exit code. Nothing branches on audience until
that final box.
Two supporting details. First, verb_path() rebuilds the verb
name for the envelope straight from std::env::args(): it skips
flags, takes the first real word, and for group verbs — publish,
toolkit, workflow, agent, workbook, deploy (a real constant) — it
reaches in for the sub-verb too. So the envelope reports
"toolkit audit", not just "toolkit"; there is a test
that pins exactly that string. Second, the commands that genuinely need to
render different content per audience — like doctor — take
a human: bool and shape their data accordingly: prose for people,
a structured body for agents. Everything else is identical work, spoken three
ways.
one shape for every OUTCOME
In Json mode, success and failure share a single, predictable skeleton —
the thing a script parses once and trusts forever. Success is
{ ok: true, verb, data }; failure is { ok: false, verb,
error: { code, message, hint, retryable } }. Here is a real failure,
no engine reachable:
$ WBX_ENGINE_URL=http://127.0.0.1:1 wbx --json rt status
{"ok":false,"verb":"rt status","error":{"code":3,
"message":"…tcp connect error 127.0.0.1:1…",
"hint":"no engine reachable — start one with `wbx deploy local` or set WB_ENGINE_URL",
"retryable":true}}
$ echo $?
3
Three things in that envelope earn a closer look. The data
field is embedded structurally when it can be: render_ok
tries to parse the command's output as JSON, and if the command already
produced JSON, it nests it as real structure; only if it's plain text does it
wrap it as { "text": out }. You never get JSON-inside-a-string
when real JSON was available. The retryable flag is
computed, not hand-set — it's true exactly when the code
is EXIT_ENGINE (3) or EXIT_CONFLICT (6), the two
failures a retry can plausibly fix. And note where the envelope rides: even on
failure it goes to stdout, not stderr — because in Json mode the
envelope is the answer, success or not. (In Human and Agent mode,
errors still go to stderr the normal way; the stdout-failure rule is specific
to the envelope.)
classify, the honest HEURISTIC
A depth-rung — skippable on a first read, essential if you're writing the retry loop.
The exit codes are a small, fixed set: 0 for ok; 2
for a usage error (owned by clap, not by wbx — there's a test pinning that);
and a handful of typed failures — EXIT_ENGINE=3,
EXIT_NOT_FOUND=4, EXIT_VERIFY=5,
EXIT_CONFLICT=6, EXIT_AUTH=7. Everything that doesn't
match a category falls to 1, the catch-all.
How does a code get chosen? Today, by classify() — and the
source is refreshingly blunt about what it is: a heuristic over the lowercased
text of the error chain, until errors carry typed codes; the contract is
the code, not the text. That sentence is the most important thing on this
page for a script author. You branch on the number. You never parse the
message. The string matching is an implementation detail on its way to being
replaced; the code is the promise.
| error text contains… | code | the hint it carries |
|---|---|---|
| connection refused · error sending request · runtime.json · no runtime · tcp connect | 3 | no engine reachable — start one with wbx deploy local or set WB_ENGINE_URL |
| 404 · not found · no such file | 4 | names the list verbs — wbx library, wbx toolkit list, wbx workbook list |
| signature · integrity · verif | 5 | re-sign with wbx sign |
| 409 · conflict | 6 | state moved underneath you — re-read and retry |
| 401 · unauthorized · 403 · forbidden | 7 | set WB_ENGINE_TOKEN, or re-run wbx deploy local |
The verdict of that table: each failure class names the command that fixes
it, and the two highlighted rows — 3 and 6 — are the ones retryable
marks true. A 4 means enumerate what exists; a 5 means re-sign; a
7 means fix your token and never retry blindly. This is different from
the broader what-a-script-should-do table on the parent page
— that one stays there; this one shows the seam underneath it.
manners vs FORMAT
This is the page's sharpest distinction, and the one most people get
wrong. Piping wbx does not give you the JSON envelope. A pipe selects
Agent mode, which is manners — no prompts, no color, plain
text. The envelope is a format, and you get it only by asking:
--json. Agent mode is how wbx behaves; Json mode is what it
emits. There is a contract test whose entire job is to prove that a piped
command's output does not start with {"ok".
| Human (TTY) | Agent (piped / --agent) | Json (--json) | |
|---|---|---|---|
| prompts? | yes — pickers | never | never |
| color / ANSI? | contract promises it | none | none |
| success output | landing + verb menu | the plain string | {ok,verb,data} envelope |
| failure output | wbx: {err} + hint | wbx: {err}, no hint | {ok:false,…,error} |
| where's the hint? | stderr, under the error | dropped | structural, in the envelope |
| exit code | 0–7 | 0–7 | 0–7 |
The hint routing in that table is the subtle part. The hint is for whoever
can act on it, in the channel they actually read. A human gets it on stderr,
right under the error, where their eyes already are. An agent in plain mode
gets only wbx: {err} — no hint line, because plain agent output
is for piping, not advising. And Json mode carries the hint structurally, so
the consumer that parses the envelope can act on it programmatically. Same
hint, three deliveries — or none, when none would be noise.
what only humans GET
A depth-rung — the two features that live entirely behind the mode gate.
Some things are luxuries a machine must never reach. wbx keeps two of them, and both are gated so hard that agent mode literally cannot trip over them.
The first is pick() — a hand-rolled picker that prints its
menu to stderr and reads one line from stdin. Its doc comment is a
warning to future maintainers: HUMAN mode only — callers must gate on mode;
agent mode never reaches this. It has exactly one real call site,
wbx deploy init run bare on a terminal, which asks where the
engine should live:
? engine place — where does this run?
› 1) local a container on this machine — cloud-identical
2) cloud fly.io, under your own account
[1]
Pipe that same command with stdin closed and there is no hang — it instantly
takes the local default, writes deployment.org, and
answers: wrote deployment.org (local) — edit it, then `wbx deploy
apply`. The gate is enforced twice over: on a non-TTY the picker is never
called, and the wasm32 build of pick() always returns the default
no matter what — the in-sandbox wbx physically cannot prompt.
The second luxury is next_hint() — verb chaining, printed to
stderr as → next: …, human only. After a successful
deploy init on a terminal, the output grows a tail an agent never
sees: → next: wbx deploy apply (converge to what you declared).
The chains are short and real: build → bundle → sign → (verify or workbook
deploy); deploy init → apply → status/logs; toolkit build → push; workbook
deploy → list.
flowchart LR b["build"] --> bu["bundle"] --> s["sign"] --> v["verify"] s --> wd["workbook deploy"] di["deploy init"] --> ap["apply"] --> st["status / logs"] style b fill:#aee5c2,stroke:#121316 style di fill:#aee5c2,stroke:#121316
Read that left to right as a human's guided path: finish a build and wbx whispers now bundle; bundle and it says now sign; sign and it offers two ways forward — verify it, or deploy the workbook. Init a deploy and the chain points you to apply, then to status and logs. An agent gets none of this, and wants none of it — it already knows where it's going.
the contract is a TEST suite
None of this is documentation-only. Every promise on this page has an
assertion behind it in cli/tests/mode.rs. That file is the real
contract; the prose here is a tour of it.
| the promise | the test that pins it |
|---|---|
| success is one envelope, ok=true, verb + data present | json_envelope_on_success |
| not-found exits 4 and the envelope's code is 4 | json_envelope_and_code_on_not_found |
a pipe is plain text — output never starts with {"ok" | piped_default_is_plain_text_no_envelope |
| deploy init with stdin closed never prompts; writes the file | agent_mode_deploy_init_never_prompts |
WBX_* beats WB_* when both are set | wbx_env_aliases_win_over_wb |
| open on a missing path → exit 4 | status_is_the_landing_and_open_maps_not_found |
| bare wbx is a landing, not a usage error — exit 0 | bare_wbx_is_a_landing_not_a_usage_error |
| the classify table itself maps to the contract codes | classify_maps_the_contract_codes |
The verdict of that table: the most agent-hostile failure modes — a hang on
a closed stdin, an envelope that disagrees with its own exit code, a pipe that
vomits JSON you didn't ask for — are each a named, failing-if-broken test. The
never-prompts test is the strongest: it runs wbx deploy init with
stdin set to Stdio::null(), asserts the command succeeds, and
asserts deployment.org exists on disk afterward. Closed stdin, no
hang, file written — proven, not promised.
Two more worth naming because they shape everyday use. doctor
always exits 0, even with no engine reachable — a health check
must not fail the shell, and a test enforces it. And the author verbs accept
- for stdin: echo "* hello" | wbx lint - returns
[], which is the seam the sibling pipelines
lesson is built on.
where the seam is HONEST
The mode model is small and tested, but it isn't finished, and pretending otherwise would undercut the whole point of a contract. Five honest edges:
- classify is string-matching, for now. Codes are derived from the text of the error chain. Typed error codes are the future; the code is the contract today, and that's exactly why you branch on the number and never on the message.
- exit 2 belongs to clap. Usage errors are emitted by the argument
parser, not by
classify()— a proven, deliberate boundary. - 1 is the catch-all. Anything that doesn't match a typed category
lands on
1. Treat it as unknown, not as a specific failure. - WBX_AGENT must be exactly
1. Nottrue, notyes— the literal string1. - the failure envelope rides stdout. In Json mode, errors don't go to stderr — the envelope does, on stdout, so a single parse covers both outcomes.
One thing we won't claim: whether Human mode paints color and tables today. The SPEC promises that DX — color, tables, pickers, a landing — and the pickers and landing are real and tested. The color we won't overstate. What we will stand behind, fully, is the inverse and tested half of the contract: agent mode emits no ANSI, ever. That's the half a script depends on, and that half is law.
questions people actually ASK
Why does piping give me plain text and not JSON?
Because a pipe selects Agent mode — which is manners, not format. Agent
mode means no prompts and no color; it does not mean the envelope. The
envelope is a separate request: add --json. There's a test that
proves a bare pipe's output never starts with {"ok", so this is
a guarantee, not an accident.
Is the error message stable enough to parse?
No — and you shouldn't. The message is a heuristic's input and may change.
Branch on error.code in the envelope, or on the process exit
code in a shell. The code is the contract; the text is for humans.
How do I force human manners inside a pipe?
You can't. There is an --agent flag and a --json
flag, but no --human flag — Human mode is reached only by a real
terminal on stdout. That's intentional: the human luxuries (pickers, the
→ next chain) are precisely the things that misbehave when
nobody's watching, so the gate is one-way.
What is retryable actually computed from?
The exit code, nothing else. It's true exactly when the code
is 3 (engine unreachable) or 6 (conflict) — the two failures a retry can
plausibly clear. It is never hand-set per command, so it can't drift out of
sync with the code beside it.
Does the old WB_* prefix still work?
Yes. The canonical prefix is WBX_*, and WB_* is
accepted as a fallback — env_var(key) tries WBX_{key}
first, then WB_{key}. When both are set, the WBX_*
spelling wins; a test points each at a different port and proves it. (One hint
string still spells out WB_ENGINE_URL — both work, so don't read
either spelling as exclusive.)
How does an agent learn the whole surface in one call?
Bare wbx in agent mode returns the doctor body with the full
verb tree spliced in — health and surface in a single response. And
wbx help --json is a pre-clap intercept that emits the entire
verb tree as an envelope, group verbs and all. One call orients the agent.
keep GOING
Modes are the seam under the command line — start at the parent, then follow the seam outward to the things it makes safe.