one machine, two AUDIENCES
Your Nexus is the most privileged thing you own. The security section of that lesson spent its whole length protecting what lives there: secrets, agents, build tools, the Dock — the seam that runs real capabilities on demand. Now you want to take a single page out of it and put that page on the open internet, where the least-trusted requests on Earth will hammer it.
Every instinct says those two facts are incompatible. The dread is specific and correct: if my published site and my engine share a machine, a request to the site is a request to the engine. So the conventional answer is a stack — nginx in front to terminate and route, a separate static host on the side to serve the safe bytes, a cert manager like Caddy to juggle domains, and a proxy config whose entire job is to keep the public traffic from ever reaching the privileged ports. Three machines and a config file standing between disaster and you, and any one of them misconfigured is the incident.
This lesson is the other answer. Same box, both audiences, and the wall between them is not a proxy you can misconfigure. It's the absence of a road.
two planes, one NODE
1. a complete, separate HTTP surface — its own listener, its own router, its own trust posture — running on the same node as the others. The control plane is authenticated and does things; the content plane is anonymous, GET-only, and serves bytes.
Two separate Plug routers on two separate Bandit listeners, in one BEAM
node: Workbooks.Web for control, Workbooks.PublicWeb
for content. They share a process, a volume, and a runtime — and share almost
nothing else. The control plane is where deploys, the Dock, agents, builds,
and key escrow live, every one of them behind one authentication plug. The
content plane has none of that — and, more importantly, has no code that
could do any of it.
| plane | module | listener env | default port | auth | what exists there |
|---|---|---|---|---|---|
| control | Workbooks.Web | WB_WEB=1 | 4000 | authed (whole router) | /api/*, /w/:id/call (Dock), build, commands, agents, key escrow |
| content | Workbooks.PublicWeb | WB_PUBLIC=1 | 4001 | none — anonymous | published bytes, GET only, /_changes, /_activity, /health |
That table is the whole design of the split, and it's the design honestly: one plane to operate the engine, one plane to publish from it, and a hard line drawn so an anonymous request on the second can never become an authenticated action on the first.
security by routes that DON'T exist
Here's the move that makes the whole thing safe, and it's not the move you'd
guess. The content plane isn't the engine with auth bolted on. There is no
Workbooks.Auth plug on PublicWeb at all — and there
doesn't need to be, because there's nothing on that plane to protect. No Dock
route. No build. No commands. No agents. No secret escrow. Those code paths
are not guarded on the content plane; they are absent from it. The
module never calls into them, so they cannot be reached, misconfigured into
reach, or fuzzed into reach.
And the router is GET-only by construction. Every clause is a get;
anything that isn't a GET falls straight through to a single catch-all that
answers 404. A POST doesn't get rejected by a rule — it has
nothing to match, so it cannot do anything. Contrast the one line that opens
the control plane's router:
# runtime/host/web.ex — the control plane plug(Workbooks.Auth) ← gates the ENTIRE router. Everything below needs a credential. # runtime/host/public_web.ex — the content plane plug(:match) ← no Auth. Just route, dispatch, and a GET-only surface. plug(:dispatch) match _ do: send_resp(conn, 404, "not found") ← non-GET lands here. Nowhere else to go.
The payoff is for whoever does your security review. The question "what can an anonymous request reach on the public internet?" is answered by reading one file end to end — about 530 lines — and confirming the dangerous verbs simply aren't in it. Not a rule set to audit, not a policy to verify line by line. Absence is the easiest property in the world to check.
flowchart TD
req["inbound request"]
req --> which{"which plane?
(by listener / host)"}
which -- "control :4000" --> auth{"auth ladder
passes?"}
auth -- "yes" --> rich["Dock · build · agents · keys
the rich, dangerous surface"]
auth -- "no" --> n401["401 — no dev fallback"]
which -- "content :4001" --> verb{"GET?"}
verb -- "yes" --> resolve["resolve host → app → bytes"]
verb -- "no (POST/PUT/…)" --> n404["404 — no route exists"]
style rich fill:#f3c5a3,stroke:#121316
style resolve fill:#aee5c2,stroke:#121316,stroke-width:2.5px
style n401 fill:#ffffff,stroke:#121316
style n404 fill:#ffffff,stroke:#121316
Read the right-hand path: a request lands, the node asks which plane it arrived on, and on the content plane the only branch that goes anywhere is GET → resolve → bytes. Every other verb terminates at 404. There is no arrow from the content plane into the dangerous box on the left, because in the code there is no function to draw it from.
how the planes COME UP
Each plane is opt-in, so a bare boot binds nothing — the demo can start without grabbing a port it doesn't need. You turn a plane on with an environment variable, and it listens on its own port:
| env var | turns on | port env | default | notes |
|---|---|---|---|---|
WB_WEB=1 | control plane | PORT | 4000 | the authed surface |
WB_PUBLIC=1 | content plane (HTTP) | PUBLIC_PORT | 4001 | anonymous, GET-only |
WB_PUBLIC_TLS=1 | content plane (HTTPS) | PUBLIC_TLS_PORT | 4443 | adds sni_fun — the per-domain cert lane |
The TLS plane is the interesting one: switching it on installs a
sni_fun into the transport — the per-connection cert callback
this lesson builds up to in the SNI section. Everything else is just a listener
binding a port.
One detail worth knowing because it surprises people: every listener is
dual-stack. It binds :: with IPv6-v6only turned off, so a single
socket answers both families. The reason is concrete — private platform
networks (the 6PN tunnels used between cloud machines) are IPv6-only, while a
local client on your laptop dials IPv4. One dual-stack socket serves both
without a second listener. (Container mode under krunvm is
the lone exception — its transport wedges with more than one acceptor or any
inet6 option, so desktop binds plain IPv4 with a single acceptor. A
depth-rung footnote, bisect-proven, that you'll never touch.)
the HOST is the address
On the content plane, the path doesn't pick the app — the host header
does. Domains.resolve/1 takes the incoming hostname and answers
with an app id. A registered host wins outright; if the host isn't registered,
the rule is dead simple: the leftmost DNS label is the app id. So
demo.apps.example resolves to the app named demo. One
hostname, one app, one origin.
That one-origin-per-app rule is the real tenant wall, and it's the browser's wall, not ours. Because each hosted app lives on its own origin, the page's JavaScript is bound by same-origin policy — it can never read or ride a cookie or session belonging to the control plane, because that plane lives on a different hostname entirely. Separate hostnames per plane mean a published app's code physically cannot reach across to the engine's credentials. The browser enforces it for free.
sequenceDiagram
participant B as browser
participant PW as PublicWeb (:4001)
participant D as Domains.resolve
participant FS as build/public/<app>
B->>PW: GET / (Host: demo.apps.example)
PW->>D: resolve("demo.apps.example")
D-->>PW: app = "demo" (leftmost label)
PW->>FS: read demo's static tree
FS-->>PW: index.html bytes
PW-->>B: 200 + x-served-by header + honesty comment
Walk that exchange: the browser asks for the root of
demo.apps.example, the plane hands the host to
resolve, gets back the app demo, reads that app's
static tree off the durable volume at WB_DATA/build/public/demo/,
and returns the bytes — stamped with the honesty header and comment the next
section covers. The path never named the app; the host did.
How does content land in build/public/<app> in the first
place? Through Workbooks.SitePublish — a host-brokered publish
step, the agent's publish tool. The host mirrors the app's
content/** and blog/** into the served tree using
pure Elixir File operations — no shell, no copy command running on
the OS — landing in the app named by WB_PUBLIC_APP or the tenant.
Publishing is itself a brokered host capability, which is exactly the
Dock discipline applied to getting bytes onto the
public plane.
pages, not FILES
Depth rung — skippable. The content plane serves pages, not raw files, and you are reading the proof of it right now. The resolution ladder for a request path is three steps:
- try the exact file at that path;
- else try
<path>.html; - else try
<dir>/index.html.
And two canonicalizing redirects keep URLs clean: an inbound
.html URL gets a 301 to its extensionless form, and
/index.html collapses to /. That's why a request for
/learn/planes.html bounces to /learn/planes — the
page you're on. Clean URLs aren't a build-time convention here; they're a
serving-time guarantee, proven in the public-web tests.
Path traversal is defended twice over. Any request path containing a
.. segment is rejected outright, and on top of that the resolved
filename is run through a Path.expand containment check to confirm
it still sits inside the site directory — which also covers symlinks that try
to point out. Belt and suspenders, because the attacker here is anonymous and
the internet is patient.
When an app has no static tree at all, the plane falls back to serving the stored workbook directly: a complete HTML document is served verbatim, with no double-wrapping, while org-source gets rendered by the OQL kernel into a document shell. Either way, what reaches the browser is the inert published form — exactly the file the workbook lesson draws its boundary around, never a live engine seam.
an honest SURFACE
Every response the content plane sends identifies itself. A
register_before_send hook sets a header on the way out:
x-served-by: workbooks-runtime
And every HTML body gets a comment injected before </head>
— idempotent, so it's never doubled if the page already carries it:
<!-- Served by the Workbooks runtime — public content plane (github.com/workbooks-sh) -->
That's the x-served-by header and the view-source comment you've
probably already noticed on these pages. They're not branding — they're a
standing claim about what served you, verifiable from outside with one
curl -I.
The plane also hosts two anonymous, read-only feeds — residents of the
public surface, named with an underscore prefix so they can't collide with a
workbook path. GET /_changes returns the app's real git log,
newest-first, capped at 30 entries, plus the keeper's status. It mirrors the
authed /rcp/changes route on the control plane, with one
difference that's the whole point: it resolves the app from the public
host, not from a tenant credential. The published app serves its own
verifiable history, to anyone, without logging in:
$ curl -s https://workbooks.sh/_changes | jq .changes[0]
{
sha: d1db404,
msg: docs(groundskeeper): live on fly — permanent bridge …
}
GET /_activity goes one layer deeper — keeper status plus a tail
of step telemetry (the last 8 events for a single agent, or a merged last 60 in
the multi-agent shape, trimmed to 10 on the wire) and a narration thought or latest
daydream. It has two shapes depending on how many agents are running, and both
404 with no app for host when the hostname doesn't resolve. This lesson
introduces the feeds only as plane residents; the
changelogs lesson tells their full story, and the
keepers lesson covers who produces the activity they
show.
one node, MANY domains
Now the aha that retires the reverse proxy. The thing people run Caddy or
nginx for — serving many custom domains, presenting the right TLS certificate
for whichever one is being dialed — is, it turns out, one function. Erlang's
:ssl layer takes a callback called sni_fun: on every
handshake it hands you the SNI hostname the client asked for, and you hand back
the cert to present. Wire that callback to a lookup and one BEAM node serves
any number of custom domains. No second machine, no Caddy sidecar.
The constraint that shapes everything is where sni_fun
runs: inside the TLS acceptor, on every single handshake. So it must be fast
and it must not block. It absolutely cannot call into a GenServer — a slow
reply would cause head-of-line blocking on the acceptor and stall every
pending handshake behind it. The registry solves this with a deliberate
read/write split: an ETS table (:protected,
read_concurrency: true) that sni_fun reads directly
with no process hop, and a single GenServer that serializes all writes so there
is exactly one writer. Reads are lock-free and instant; writes are rare and
ordered.
sequenceDiagram participant C as client (TLS) participant A as TLS acceptor participant S as sni_fun participant E as ETS :wb_domains C->>A: ClientHello (SNI = app.acme.com) A->>S: sni_fun(~c"app.acme.com") S->>E: lookup (no GenServer — lock-free read) E-->>S: row → certfile + keyfile S-->>A: [certfile: …, keyfile: …] A-->>C: handshake completes Note over C,A: only now does HTTP exist → PublicWeb
Follow it as a story: the client says hello and names the host it wants; the
acceptor calls sni_fun with that name as a charlist — the form
:ssl always uses; sni_fun reads ETS directly and
finds the row; it returns the cert and key file paths; the handshake completes;
and only then does an HTTP request exist for PublicWeb to
serve. The certs themselves are materialized to files on disk —
build/cache/tls/<safe-host>.crt and .key, the
key written chmod 0600 — because :ssl wants file
paths, not PEM blobs in memory.
the handshake is the BOUNCER
The same lookup that picks the cert is also the admission gate — and this is
the sharp edge of the design. When sni_fun is handed a host that
isn't registered, it returns the empty list []. With no cert to
present, the TLS handshake cannot complete. A stranger doesn't get a 403
page. They don't get a 404. They don't get an HTTP response at all — because
there is no TLS session for HTTP to ride on. Rejection happens before HTTP
exists. The registry is the admission gate for the public plane.
iex> Workbooks.Domains.attach("app.acme.dev", "acme", "storefront")
{:ok, %{host: "app.acme.dev", tenant: "acme", app: "storefront", status: :pending}}
iex> Workbooks.Domains.put_cert("app.acme.dev", cert_pem, key_pem)
{:ok, %{status: :live, certfile: "build/cache/tls/app.acme.dev.crt", …}}
iex> Workbooks.Domains.sni(~c"app.acme.dev")
[certfile: "build/cache/tls/app.acme.dev.crt", keyfile: "build/cache/tls/app.acme.dev.key"]
iex> Workbooks.Domains.sni(~c"stranger.example")
[] ← no cert → the TLS handshake cannot complete
From outside, the difference is stark.
openssl s_client -servername stranger.example against the TLS port
returns ssl handshake failure — no connection, no information, nothing
to probe. The same command with an attached host completes cleanly. An attacker
scanning your node learns only that some hosts work and unknown ones are
invisible at the transport layer.
stateDiagram-v2 [*] --> unknown unknown --> pending: attach(host, tenant, app) pending --> live: put_cert(cert, key) live --> [*]: serves over TLS unknown --> refused: sni() → [] → TLS handshake fails refused --> [*]
That lifecycle — unknown, then pending on attach, then live once a cert is installed — encodes a deliberate refusal. Certs are issued on attach, not on handshake. Caddy's on-demand model issues a certificate the first time anyone dials a hostname, which turns your server into a cert mill an attacker can drive by spraying random hostnames; and real ACME issuance takes seconds, far too long to stall a handshake behind. So here a verified attach happens first, the cert is provisioned out of band, and only an already-issued cert ever gets presented inside the millisecond budget of a handshake. The two-step isn't bureaucracy — it's the anti-footgun.
the shape in PRODUCTION
workbooks.sh eats this dogfood — this very lesson is served by
Workbooks.PublicWeb. The deployed reference is
web/deploy/fly.toml for the app wb-site, running the
same ghcr.io/workbooks-sh/runtime:latest image you'd run, with a
durable volume mounted at /data:
# web/deploy/fly.toml — two planes, two services [env] WB_DATA = "/data" WB_PUBLIC = "1" ← content plane on WB_PUBLIC_APP = "wb-site" ← serves /data/build/public/wb-site PUBLIC_PORT = "4001" WB_TENANT = "wb-site" [[services]] ← public: 80/443 → internal 4001 (PublicWeb) internal_port = 4001 auto_stop_machines = "off" ← this is an origin, not a demo [[services]] ← control: port 4000, TLS, locked internal_port = 4000 by WB_PUBLIC_BEARER — authed only
Two services, two planes, exactly as the table at the top of this lesson
draws them. Public 80/443 maps to the anonymous content plane on 4001;
auto_stop_machines is off because an origin can't go to sleep
between visitors. The control plane sits on 4000, exposed but locked by a
256-bit WB_PUBLIC_BEARER shared secret — present it and you're in,
absent or wrong and it's a flat 401 with no dev fallback. Locked is not the
same as hidden; the bearer is covered in the tokens
lesson, and the full deploy shape in deployments.
One honest framing about the reverse proxy. There's no proxy
between the planes — that's the claim this lesson makes good on. But
Cloudflare does sit in front of the whole node as a CDN and edge, and on this
deploy Fly terminates the public TLS itself. That's not a contradiction — a
lone BEAM VM can't self-defend against a volumetric DDoS, and the plan says so
plainly: the edge is optional defense-in-depth, orthogonal to the plane
split, not a dependency of it. (A consequence worth noting: because Fly
terminates TLS here, the SNI/4443 lane isn't what serves workbooks.sh today.
WB_PUBLIC_TLS is the primitive for serving custom domains
directly; don't conflate the two.)
what ISN'T built
The honest accounting, all of it verifiable in the source. The plane split is real and shipped; several of the things you'd want around it are still plan, not module.
- ACME is not built. There is no
Workbooks.Acmemodule — grep returns zero definitions. Automatic cert issuance and renewal is a planned phase. Today certificates are installed manually viaput_cert/3. The SNI machinery is real and shipped; what feeds it certs automatically is not. - The registry is not durable. The domain table is fresh ETS on every boot — there's no backend-backed reload, so a restart forgets which hosts were attached. The cert files persist on disk, but the rows that map host to app are gone. A durable backing table is in the plan; it isn't implemented.
- No attach flow yet. Nothing in the runtime calls
attachorput_certexcept tests — the agent-assisted DNS-and-attach flow is a later phase. The primitives work; the wizard that drives them for you doesn't exist. - No rate limiting on the content plane yet. The plan names PlugAttack and Hammer as the intended tools; the checklist item is unchecked.
- One node is one blast radius. The honest downside of one BEAM node, named in the plan: the planes share a process and a machine, so a crash is a shared crash. The deliberate mitigation is that the modules are kept separable — the content plane could become its own node later with, in the plan's words, zero redesign. That escape hatch is designed in, not yet exercised.
None of this weakens the core claim — an anonymous request can't reach your engine, because the routes aren't there. It just means the custom-domain story is a working primitive with the convenience layer still to come.
questions people actually ASK
Can a public visitor reach my agents or secrets?
No — and not because a rule says no. The content plane has no route to the Dock, no build route, no agent route, no secret-escrow route. Those functions are absent from the module, and non-GET requests have nothing to match at all. You can confirm it by reading one file. There's no path to misconfigure.
Do I need nginx or Caddy in front?
Not between the planes — that's the whole point. One BEAM node runs both
surfaces and, via sni_fun, serves any number of custom domains
itself. An edge CDN like Cloudflare in front is optional defense-in-depth
against volumetric floods, orthogonal to the split. workbooks.sh uses one;
your engine doesn't require one.
What's the x-served-by header I keep seeing?
An honesty marker. Every content-plane response carries
x-served-by: workbooks-runtime, and every HTML page carries a
matching view-source comment before </head>. It's a
standing, externally verifiable claim about what served you — run
curl -I on any page here and you'll see it.
Custom domain today — what's actually possible?
The primitive works: attach a host, put_cert a
certificate and key, and sni_fun presents it on every handshake —
one node, many domains. What's not built yet is automatic issuance (no ACME
module) and a durable registry (a restart forgets attachments). So custom
domains are possible with manually installed certs, today; hands-off issuance
is coming.
Why does my unknown test domain fail TLS instead of 404ing?
Because the registry is the admission gate, and it works before
HTTP exists. An unregistered host gets [] from
sni_fun — no cert, so the handshake can't complete and there's no
HTTP layer to return a 404 on. It's a feature: strangers get no session, no
response, and no information to probe.
Is the control plane on the public internet?
It can be reachable, but locked is not hidden. On a cloud deploy the control
plane is gated by a 256-bit WB_PUBLIC_BEARER — the right bearer
gets in, a wrong or absent one gets a flat 401 with no dev fallback. It's a
different port and, properly, a different hostname from the content plane, so
a published page's JavaScript can't ride its session.
keep GOING
Planes are the HTTP face of the Nexus security model — the inbound side of "isolation is software." Here's where the threads continue.