learn / 02·5 — under nexus · planes

two planesONEmachine

A plane is a complete, separate HTTP surface — its own listener, its own router, its own trust posture. One serves your engine to people you trust; the other serves bytes to the open internet. Same BEAM node, no reverse proxy. The isolation isn't a firewall — it's that the dangerous routes don't exist on the public side.

planes11 min read
A lone figure stands on a narrow catwalk between two monumental glowing terminals — one a fortified amber control tower bristling with locks and gates, the other a wide-open green public concourse where light streams out to a crowd — both rising from a single bright machine-island, 1970s sci-fi style

one machine, two AUDIENCES

Your Nexus is the most privileged thing you own. The security section of that lesson spent its whole length protecting what lives there: secrets, agents, build tools, the Dock — the seam that runs real capabilities on demand. Now you want to take a single page out of it and put that page on the open internet, where the least-trusted requests on Earth will hammer it.

Every instinct says those two facts are incompatible. The dread is specific and correct: if my published site and my engine share a machine, a request to the site is a request to the engine. So the conventional answer is a stack — nginx in front to terminate and route, a separate static host on the side to serve the safe bytes, a cert manager like Caddy to juggle domains, and a proxy config whose entire job is to keep the public traffic from ever reaching the privileged ports. Three machines and a config file standing between disaster and you, and any one of them misconfigured is the incident.

This lesson is the other answer. Same box, both audiences, and the wall between them is not a proxy you can misconfigure. It's the absence of a road.

two planes, one NODE

plane /pleɪn/ noun

1. a complete, separate HTTP surface — its own listener, its own router, its own trust posture — running on the same node as the others. The control plane is authenticated and does things; the content plane is anonymous, GET-only, and serves bytes.

Two separate Plug routers on two separate Bandit listeners, in one BEAM node: Workbooks.Web for control, Workbooks.PublicWeb for content. They share a process, a volume, and a runtime — and share almost nothing else. The control plane is where deploys, the Dock, agents, builds, and key escrow live, every one of them behind one authentication plug. The content plane has none of that — and, more importantly, has no code that could do any of it.

planemodulelistener envdefault portauthwhat exists there
controlWorkbooks.WebWB_WEB=14000authed (whole router)/api/*, /w/:id/call (Dock), build, commands, agents, key escrow
contentWorkbooks.PublicWebWB_PUBLIC=14001none — anonymouspublished bytes, GET only, /_changes, /_activity, /health

That table is the whole design of the split, and it's the design honestly: one plane to operate the engine, one plane to publish from it, and a hard line drawn so an anonymous request on the second can never become an authenticated action on the first.

security by routes that DON'T exist

Here's the move that makes the whole thing safe, and it's not the move you'd guess. The content plane isn't the engine with auth bolted on. There is no Workbooks.Auth plug on PublicWeb at all — and there doesn't need to be, because there's nothing on that plane to protect. No Dock route. No build. No commands. No agents. No secret escrow. Those code paths are not guarded on the content plane; they are absent from it. The module never calls into them, so they cannot be reached, misconfigured into reach, or fuzzed into reach.

And the router is GET-only by construction. Every clause is a get; anything that isn't a GET falls straight through to a single catch-all that answers 404. A POST doesn't get rejected by a rule — it has nothing to match, so it cannot do anything. Contrast the one line that opens the control plane's router:

# runtime/host/web.ex — the control plane
plug(Workbooks.Auth)        ← gates the ENTIRE router. Everything below needs a credential.

# runtime/host/public_web.ex — the content plane
plug(:match)                ← no Auth. Just route, dispatch, and a GET-only surface.
plug(:dispatch)
match _ do: send_resp(conn, 404, "not found")   ← non-GET lands here. Nowhere else to go.

The payoff is for whoever does your security review. The question "what can an anonymous request reach on the public internet?" is answered by reading one file end to end — about 530 lines — and confirming the dangerous verbs simply aren't in it. Not a rule set to audit, not a policy to verify line by line. Absence is the easiest property in the world to check.

flowchart TD
  req["inbound request"]
  req --> which{"which plane?
(by listener / host)"} which -- "control :4000" --> auth{"auth ladder
passes?"} auth -- "yes" --> rich["Dock · build · agents · keys
the rich, dangerous surface"] auth -- "no" --> n401["401 — no dev fallback"] which -- "content :4001" --> verb{"GET?"} verb -- "yes" --> resolve["resolve host → app → bytes"] verb -- "no (POST/PUT/…)" --> n404["404 — no route exists"] style rich fill:#f3c5a3,stroke:#121316 style resolve fill:#aee5c2,stroke:#121316,stroke-width:2.5px style n401 fill:#ffffff,stroke:#121316 style n404 fill:#ffffff,stroke:#121316

Read the right-hand path: a request lands, the node asks which plane it arrived on, and on the content plane the only branch that goes anywhere is GET → resolve → bytes. Every other verb terminates at 404. There is no arrow from the content plane into the dangerous box on the left, because in the code there is no function to draw it from.

how the planes COME UP

Each plane is opt-in, so a bare boot binds nothing — the demo can start without grabbing a port it doesn't need. You turn a plane on with an environment variable, and it listens on its own port:

env varturns onport envdefaultnotes
WB_WEB=1control planePORT4000the authed surface
WB_PUBLIC=1content plane (HTTP)PUBLIC_PORT4001anonymous, GET-only
WB_PUBLIC_TLS=1content plane (HTTPS)PUBLIC_TLS_PORT4443adds sni_fun — the per-domain cert lane

The TLS plane is the interesting one: switching it on installs a sni_fun into the transport — the per-connection cert callback this lesson builds up to in the SNI section. Everything else is just a listener binding a port.

One detail worth knowing because it surprises people: every listener is dual-stack. It binds :: with IPv6-v6only turned off, so a single socket answers both families. The reason is concrete — private platform networks (the 6PN tunnels used between cloud machines) are IPv6-only, while a local client on your laptop dials IPv4. One dual-stack socket serves both without a second listener. (Container mode under krunvm is the lone exception — its transport wedges with more than one acceptor or any inet6 option, so desktop binds plain IPv4 with a single acceptor. A depth-rung footnote, bisect-proven, that you'll never touch.)

the HOST is the address

On the content plane, the path doesn't pick the app — the host header does. Domains.resolve/1 takes the incoming hostname and answers with an app id. A registered host wins outright; if the host isn't registered, the rule is dead simple: the leftmost DNS label is the app id. So demo.apps.example resolves to the app named demo. One hostname, one app, one origin.

That one-origin-per-app rule is the real tenant wall, and it's the browser's wall, not ours. Because each hosted app lives on its own origin, the page's JavaScript is bound by same-origin policy — it can never read or ride a cookie or session belonging to the control plane, because that plane lives on a different hostname entirely. Separate hostnames per plane mean a published app's code physically cannot reach across to the engine's credentials. The browser enforces it for free.

sequenceDiagram
  participant B as browser
  participant PW as PublicWeb (:4001)
  participant D as Domains.resolve
  participant FS as build/public/<app>
  B->>PW: GET / (Host: demo.apps.example)
  PW->>D: resolve("demo.apps.example")
  D-->>PW: app = "demo" (leftmost label)
  PW->>FS: read demo's static tree
  FS-->>PW: index.html bytes
  PW-->>B: 200 + x-served-by header + honesty comment
  

Walk that exchange: the browser asks for the root of demo.apps.example, the plane hands the host to resolve, gets back the app demo, reads that app's static tree off the durable volume at WB_DATA/build/public/demo/, and returns the bytes — stamped with the honesty header and comment the next section covers. The path never named the app; the host did.

How does content land in build/public/<app> in the first place? Through Workbooks.SitePublish — a host-brokered publish step, the agent's publish tool. The host mirrors the app's content/** and blog/** into the served tree using pure Elixir File operations — no shell, no copy command running on the OS — landing in the app named by WB_PUBLIC_APP or the tenant. Publishing is itself a brokered host capability, which is exactly the Dock discipline applied to getting bytes onto the public plane.

pages, not FILES

Depth rung — skippable. The content plane serves pages, not raw files, and you are reading the proof of it right now. The resolution ladder for a request path is three steps:

  • try the exact file at that path;
  • else try <path>.html;
  • else try <dir>/index.html.

And two canonicalizing redirects keep URLs clean: an inbound .html URL gets a 301 to its extensionless form, and /index.html collapses to /. That's why a request for /learn/planes.html bounces to /learn/planes — the page you're on. Clean URLs aren't a build-time convention here; they're a serving-time guarantee, proven in the public-web tests.

Path traversal is defended twice over. Any request path containing a .. segment is rejected outright, and on top of that the resolved filename is run through a Path.expand containment check to confirm it still sits inside the site directory — which also covers symlinks that try to point out. Belt and suspenders, because the attacker here is anonymous and the internet is patient.

When an app has no static tree at all, the plane falls back to serving the stored workbook directly: a complete HTML document is served verbatim, with no double-wrapping, while org-source gets rendered by the OQL kernel into a document shell. Either way, what reaches the browser is the inert published form — exactly the file the workbook lesson draws its boundary around, never a live engine seam.

an honest SURFACE

Every response the content plane sends identifies itself. A register_before_send hook sets a header on the way out:

x-served-by: workbooks-runtime

And every HTML body gets a comment injected before </head> — idempotent, so it's never doubled if the page already carries it:

<!-- Served by the Workbooks runtime — public content plane (github.com/workbooks-sh) -->

That's the x-served-by header and the view-source comment you've probably already noticed on these pages. They're not branding — they're a standing claim about what served you, verifiable from outside with one curl -I.

The plane also hosts two anonymous, read-only feeds — residents of the public surface, named with an underscore prefix so they can't collide with a workbook path. GET /_changes returns the app's real git log, newest-first, capped at 30 entries, plus the keeper's status. It mirrors the authed /rcp/changes route on the control plane, with one difference that's the whole point: it resolves the app from the public host, not from a tenant credential. The published app serves its own verifiable history, to anyone, without logging in:

$ curl -s https://workbooks.sh/_changes | jq .changes[0]
{
  sha: d1db404,
  msg: docs(groundskeeper): live on fly — permanent bridge …
}

GET /_activity goes one layer deeper — keeper status plus a tail of step telemetry (the last 8 events for a single agent, or a merged last 60 in the multi-agent shape, trimmed to 10 on the wire) and a narration thought or latest daydream. It has two shapes depending on how many agents are running, and both 404 with no app for host when the hostname doesn't resolve. This lesson introduces the feeds only as plane residents; the changelogs lesson tells their full story, and the keepers lesson covers who produces the activity they show.

one node, MANY domains

Now the aha that retires the reverse proxy. The thing people run Caddy or nginx for — serving many custom domains, presenting the right TLS certificate for whichever one is being dialed — is, it turns out, one function. Erlang's :ssl layer takes a callback called sni_fun: on every handshake it hands you the SNI hostname the client asked for, and you hand back the cert to present. Wire that callback to a lookup and one BEAM node serves any number of custom domains. No second machine, no Caddy sidecar.

The constraint that shapes everything is where sni_fun runs: inside the TLS acceptor, on every single handshake. So it must be fast and it must not block. It absolutely cannot call into a GenServer — a slow reply would cause head-of-line blocking on the acceptor and stall every pending handshake behind it. The registry solves this with a deliberate read/write split: an ETS table (:protected, read_concurrency: true) that sni_fun reads directly with no process hop, and a single GenServer that serializes all writes so there is exactly one writer. Reads are lock-free and instant; writes are rare and ordered.

sequenceDiagram
  participant C as client (TLS)
  participant A as TLS acceptor
  participant S as sni_fun
  participant E as ETS :wb_domains
  C->>A: ClientHello (SNI = app.acme.com)
  A->>S: sni_fun(~c"app.acme.com")
  S->>E: lookup (no GenServer — lock-free read)
  E-->>S: row → certfile + keyfile
  S-->>A: [certfile: …, keyfile: …]
  A-->>C: handshake completes
  Note over C,A: only now does HTTP exist → PublicWeb
  

Follow it as a story: the client says hello and names the host it wants; the acceptor calls sni_fun with that name as a charlist — the form :ssl always uses; sni_fun reads ETS directly and finds the row; it returns the cert and key file paths; the handshake completes; and only then does an HTTP request exist for PublicWeb to serve. The certs themselves are materialized to files on disk — build/cache/tls/<safe-host>.crt and .key, the key written chmod 0600 — because :ssl wants file paths, not PEM blobs in memory.

the handshake is the BOUNCER

The same lookup that picks the cert is also the admission gate — and this is the sharp edge of the design. When sni_fun is handed a host that isn't registered, it returns the empty list []. With no cert to present, the TLS handshake cannot complete. A stranger doesn't get a 403 page. They don't get a 404. They don't get an HTTP response at all — because there is no TLS session for HTTP to ride on. Rejection happens before HTTP exists. The registry is the admission gate for the public plane.

iex> Workbooks.Domains.attach("app.acme.dev", "acme", "storefront")
{:ok, %{host: "app.acme.dev", tenant: "acme", app: "storefront", status: :pending}}

iex> Workbooks.Domains.put_cert("app.acme.dev", cert_pem, key_pem)
{:ok, %{status: :live, certfile: "build/cache/tls/app.acme.dev.crt", …}}

iex> Workbooks.Domains.sni(~c"app.acme.dev")
[certfile: "build/cache/tls/app.acme.dev.crt", keyfile: "build/cache/tls/app.acme.dev.key"]

iex> Workbooks.Domains.sni(~c"stranger.example")
[]                          ← no cert → the TLS handshake cannot complete

From outside, the difference is stark. openssl s_client -servername stranger.example against the TLS port returns ssl handshake failure — no connection, no information, nothing to probe. The same command with an attached host completes cleanly. An attacker scanning your node learns only that some hosts work and unknown ones are invisible at the transport layer.

stateDiagram-v2
  [*] --> unknown
  unknown --> pending: attach(host, tenant, app)
  pending --> live: put_cert(cert, key)
  live --> [*]: serves over TLS
  unknown --> refused: sni() → [] → TLS handshake fails
  refused --> [*]
  

That lifecycle — unknown, then pending on attach, then live once a cert is installed — encodes a deliberate refusal. Certs are issued on attach, not on handshake. Caddy's on-demand model issues a certificate the first time anyone dials a hostname, which turns your server into a cert mill an attacker can drive by spraying random hostnames; and real ACME issuance takes seconds, far too long to stall a handshake behind. So here a verified attach happens first, the cert is provisioned out of band, and only an already-issued cert ever gets presented inside the millisecond budget of a handshake. The two-step isn't bureaucracy — it's the anti-footgun.

the shape in PRODUCTION

workbooks.sh eats this dogfood — this very lesson is served by Workbooks.PublicWeb. The deployed reference is web/deploy/fly.toml for the app wb-site, running the same ghcr.io/workbooks-sh/runtime:latest image you'd run, with a durable volume mounted at /data:

# web/deploy/fly.toml — two planes, two services
[env]
  WB_DATA       = "/data"
  WB_PUBLIC     = "1"          ← content plane on
  WB_PUBLIC_APP = "wb-site"    ← serves /data/build/public/wb-site
  PUBLIC_PORT   = "4001"
  WB_TENANT     = "wb-site"

[[services]]                   ← public: 80/443 → internal 4001 (PublicWeb)
  internal_port      = 4001
  auto_stop_machines = "off"   ← this is an origin, not a demo

[[services]]                   ← control: port 4000, TLS, locked
  internal_port = 4000           by WB_PUBLIC_BEARER — authed only

Two services, two planes, exactly as the table at the top of this lesson draws them. Public 80/443 maps to the anonymous content plane on 4001; auto_stop_machines is off because an origin can't go to sleep between visitors. The control plane sits on 4000, exposed but locked by a 256-bit WB_PUBLIC_BEARER shared secret — present it and you're in, absent or wrong and it's a flat 401 with no dev fallback. Locked is not the same as hidden; the bearer is covered in the tokens lesson, and the full deploy shape in deployments.

One honest framing about the reverse proxy. There's no proxy between the planes — that's the claim this lesson makes good on. But Cloudflare does sit in front of the whole node as a CDN and edge, and on this deploy Fly terminates the public TLS itself. That's not a contradiction — a lone BEAM VM can't self-defend against a volumetric DDoS, and the plan says so plainly: the edge is optional defense-in-depth, orthogonal to the plane split, not a dependency of it. (A consequence worth noting: because Fly terminates TLS here, the SNI/4443 lane isn't what serves workbooks.sh today. WB_PUBLIC_TLS is the primitive for serving custom domains directly; don't conflate the two.)

what ISN'T built

The honest accounting, all of it verifiable in the source. The plane split is real and shipped; several of the things you'd want around it are still plan, not module.

  • ACME is not built. There is no Workbooks.Acme module — grep returns zero definitions. Automatic cert issuance and renewal is a planned phase. Today certificates are installed manually via put_cert/3. The SNI machinery is real and shipped; what feeds it certs automatically is not.
  • The registry is not durable. The domain table is fresh ETS on every boot — there's no backend-backed reload, so a restart forgets which hosts were attached. The cert files persist on disk, but the rows that map host to app are gone. A durable backing table is in the plan; it isn't implemented.
  • No attach flow yet. Nothing in the runtime calls attach or put_cert except tests — the agent-assisted DNS-and-attach flow is a later phase. The primitives work; the wizard that drives them for you doesn't exist.
  • No rate limiting on the content plane yet. The plan names PlugAttack and Hammer as the intended tools; the checklist item is unchecked.
  • One node is one blast radius. The honest downside of one BEAM node, named in the plan: the planes share a process and a machine, so a crash is a shared crash. The deliberate mitigation is that the modules are kept separable — the content plane could become its own node later with, in the plan's words, zero redesign. That escape hatch is designed in, not yet exercised.

None of this weakens the core claim — an anonymous request can't reach your engine, because the routes aren't there. It just means the custom-domain story is a working primitive with the convenience layer still to come.

questions people actually ASK

Can a public visitor reach my agents or secrets?

No — and not because a rule says no. The content plane has no route to the Dock, no build route, no agent route, no secret-escrow route. Those functions are absent from the module, and non-GET requests have nothing to match at all. You can confirm it by reading one file. There's no path to misconfigure.

Do I need nginx or Caddy in front?

Not between the planes — that's the whole point. One BEAM node runs both surfaces and, via sni_fun, serves any number of custom domains itself. An edge CDN like Cloudflare in front is optional defense-in-depth against volumetric floods, orthogonal to the split. workbooks.sh uses one; your engine doesn't require one.

What's the x-served-by header I keep seeing?

An honesty marker. Every content-plane response carries x-served-by: workbooks-runtime, and every HTML page carries a matching view-source comment before </head>. It's a standing, externally verifiable claim about what served you — run curl -I on any page here and you'll see it.

Custom domain today — what's actually possible?

The primitive works: attach a host, put_cert a certificate and key, and sni_fun presents it on every handshake — one node, many domains. What's not built yet is automatic issuance (no ACME module) and a durable registry (a restart forgets attachments). So custom domains are possible with manually installed certs, today; hands-off issuance is coming.

Why does my unknown test domain fail TLS instead of 404ing?

Because the registry is the admission gate, and it works before HTTP exists. An unregistered host gets [] from sni_fun — no cert, so the handshake can't complete and there's no HTTP layer to return a 404 on. It's a feature: strangers get no session, no response, and no information to probe.

Is the control plane on the public internet?

It can be reachable, but locked is not hidden. On a cloud deploy the control plane is gated by a 256-bit WB_PUBLIC_BEARER — the right bearer gets in, a wrong or absent one gets a flat 401 with no dev fallback. It's a different port and, properly, a different hostname from the content plane, so a published page's JavaScript can't ride its session.

keep GOING

Planes are the HTTP face of the Nexus security model — the inbound side of "isolation is software." Here's where the threads continue.