the disk ends at the file's EDGE
The parent lesson sold a sealed disk, and it meant it: every path bottoms out inside the file, by construction, and nothing the code or the agents do can reach the machine underneath. That's the whole safety story. It's also, eventually, a wall — because the data you actually care about doesn't live in the file. It lives in Linear, in Asana, in Stripe, in some REST endpoint your team stood up last quarter.
Two bad instincts present themselves, and both undo something. The first: hand the agent the API key and a network socket. That demolishes the sealed-disk story the VFS lesson just spent a page building — a steerable agent with a credential and an open port is an exfiltration machine waiting for the wrong instruction. The second: paste exports into the disk by hand. Honest, safe, and stale the moment you save it.
So the missing piece isn't a bigger disk or a looser sandbox. It's a way
for FROM to mean past the file when you want it to —
without the query engine learning what HTTP is, and without the agent ever
touching the wire. That's this lesson.
the DEFINITION
1. an entity name registered by a
data-source plugin, so that FROM <entity> resolves
past the file — to a live external source or a resident local
mirror — instead of to a table inside the workbook's disk. The
OQL read face of a federation toolkit.
It is not a new query engine and not a new primitive. It's a routing
decision: an unregistered name falls through to the local
VFS exactly as before; a registered name is handed to a
plugin. The whole construct is one regex, one :persistent_term
map, and a one-callback behaviour — and the rest of this page is what those
three pieces buy you.
one FROM, two DESTINATIONS
Here is the core mechanic, and it's smaller than you'd guess. When a
query arrives, Federation.query runs a single regex over it —
\bFROM\s+([A-Za-z0-9_-]+), the first FROM only —
to lift the entity name out. Then it asks the registry one question: is this
name registered? If yes, the matching plugin's query/3 is
called and its rows come back live. If no, it returns a plain atom,
:not_federated, and the caller falls through to the local VFS
across every workbook member's SQLite store — the same query you'd have run
before any of this existed.
flowchart TD sql["a query arrives
SELECT … FROM x"] rx["lift the entity from the first FROM
regex · letters, digits, dash, underscore"] reg{"is x registered
in the source map?"} plug["module.query(x, {sql, config}, opts)
the plugin goes and gets it"] live["live rows
federated: true"] nf["return :not_federated"] vfs["fall through — vfs_query across
every member's SQLite disk"] sql --> rx --> reg reg -- "yes" --> plug --> live reg -- "no" --> nf --> vfs style sql fill:#ffffff,stroke:#121316 style rx fill:#ffffff,stroke:#121316 style reg fill:#f2ddb0,stroke:#121316,stroke-width:2.5px style plug fill:#a8d4f0,stroke:#121316 style live fill:#a8d4f0,stroke:#121316 style nf fill:#f3c5a3,stroke:#121316 style vfs fill:#aee5c2,stroke:#121316
Read that graph as a fork with a fallback. The query walks in; the regex
pulls one name off the front of FROM; the registry is a yes-or-no
gate. Down the yes branch the plugin does all the real work and the result
comes back flagged federated: true. Down the no branch nothing
exotic happens at all — the literal atom :not_federated rides
back to the caller, which quietly does what it always did: query the local
disk. The query engine never learned about HTTP, GraphQL, or any external
protocol. It learned to route on a name, and a plugin learned the rest.
That fall-through is deliberately the caller's job, not the router's.
Library.query tries federation first; an {:ok, rows}
becomes a federated result, an {:error, …} surfaces the error,
and :not_federated is the signal to go local. One seam, two
destinations, and the boundary between them is a single word in the
FROM clause.
the whole contract is one CALLBACK
depth rung · skippable — the behaviour, for the curious
A data source is one function. The Workbooks.DataSource
behaviour has exactly one callback —
query(entity, q, opts) returning {:ok, [map()]} or
{:error, term()}. Today q is
%{sql: raw, config: cfg}: the plugin gets the raw SQL string
and parses what it needs from it, plus its own per-entity config. The rows
it returns are headline-shaped maps — the same shape the rest of the system
reads.
The registry lives in :persistent_term — rare write, fast
read, the same posture as the command registry. Four functions reach it:
| call | what it does | returns |
|---|---|---|
register(entity, module, config) | bind a name to a plugin module + its config | the entity is now foreign |
lookup(entity) | find the module for a name | the module, or nil |
config(entity) | fetch that entity's config map | a map (default %{}) |
entities() | list every registered name | the foreign namespace |
The verdict of that table: registration is the entire surface. Bind a name
with register, and lookup is what the router calls
on every query to decide the fork above. config is how
per-entity settings — a URL, a rows path, an env key — reach the plugin as
q.config. There is no schema to declare, no migration, no
adapter interface beyond the one function. That deliberate smallness is why
a generic source can make any JSON endpoint a table without writing
code at all — which is the next section.
plugins announce themselves in ORG
Nobody calls register by hand. Plugins announce themselves in
the grammar. Federation.discover globs
toolkits/*/plugin/manifest.org and reads a small keyword set out
of each one: #+SLOT (must be data-source),
#+ENTITIES (a whitespace-split list of names this source owns),
#+IMPL (the Elixir module that implements query/3),
plus generic config like #+URL, #+ROWS_PATH,
#+ENV_KEYS. The toolkits root is $WB_TOOLKITS_ROOT
if it names a real directory, else the in-tree toolkits/.
flowchart TD glob["glob toolkits/*/plugin/manifest.org"] parse["parse keywords
#+SLOT · #+ENTITIES · #+IMPL · #+URL · …"] slot{"#+SLOT is
data-source?"} load{"module exists +
exports query/3?"} reg["register each entity → module + config"] skip["skip — silently, not an error"] glob --> parse --> slot slot -- "no" --> skip slot -- "yes" --> load load -- "yes" --> reg load -- "no" --> skip style glob fill:#ffffff,stroke:#121316 style parse fill:#ffffff,stroke:#121316 style slot fill:#f2ddb0,stroke:#121316,stroke-width:2.5px style load fill:#f2ddb0,stroke:#121316,stroke-width:2.5px style reg fill:#a8d4f0,stroke:#121316 style skip fill:#d9dbd3,stroke:#121316
Walk that graph and notice the forgiving shape. Discovery globs the
manifests, parses the keywords, then checks two gates: is this slot a data
source, and does the named module actually exist and export
query/3? Pass both and every listed entity registers. Fail
either — most importantly, a manifest that names a module which doesn't
exist yet — and the source is skipped, not crashed. That's a designed
property: a toolkit can ship its declarative manifest before its
implementation lands, and discovery treats the gap as "not ready," not as a
fatal error.
One honest caveat: discover has no boot call site today. It's
idempotent and invoked on demand — by tests, the console, and a planned
provision walk — but registration does not happen automatically when the
engine starts. You run discovery; you don't inherit it.
any JSON API in six LINES
The only shipped implementation is also the most useful one:
Workbooks.DataSource.Http turns any JSON HTTP endpoint into a
table with zero code. You write a manifest, not a module. Here is one a
reader could drop in today, at toolkits/orders/plugin/manifest.org:
#+TITLE: orders #+SLOT: data-source #+ENTITIES: shop-order #+IMPL: Workbooks.DataSource.Http #+URL: https://api.example.com/v1/orders #+ROWS_PATH: data.items #+ENV_KEYS: SHOP_API_KEY
Run Federation.discover() and shop-order is now
a foreign table. Then wbx library query "SELECT * FROM shop-order"
returns %{federated: true, rows: [%{member: "federation", rows: […]}],
skipped: []} — while the same query against
FROM workspace_files falls straight through to the local disk.
The #+ROWS_PATH is an optional dotted path that digs into nested
JSON: data.items reaches response["data"]["items"];
omit it and the response body is itself taken as the array. Non-map elements
are dropped; what survives is your rows.
sequenceDiagram
participant Q as a query
participant H as DataSource.Http
participant E as the host — :httpc
participant A as api.example.com
Q->>H: query("shop-order", {sql, config}, opts)
H->>E: GET the #+URL — host egress, 10s timeout
E->>A: HTTP request
A-->>E: 200 · JSON body
E-->>H: status 200..299 → body
H->>H: Jason.decode! → dig #+ROWS_PATH → drop non-maps
H-->>Q: {:ok, rows}
Read that exchange as a relay where the plugin never opens a socket. The
query calls the Http source; the source asks the host to make the
GET — the comment in the code is exact, "the plugin never opens a socket
itself" — and the host's :httpc does the egress with a
ten-second timeout. A status in 200..299 returns the body; the
source decodes it, digs the dotted rows path, drops anything that isn't a
map, and hands back the rows. And when it fails, it fails honestly: a 500
comes back as {:error, {:http_status, 500}}, propagated
untouched all the way through Federation.query. No URL at all
is {:error, :no_url}. Linear and Asana, when they land, are
specializations of exactly this pattern — their own query/3
translating OQL into their GraphQL.
one audited path for the KEY
That manifest named #+ENV_KEYS: SHOP_API_KEY, and the way that
key reaches the request is the whole trust story. The plugin never calls
System.get_env. It calls
Workbooks.Plugin.Auth.fetch("SHOP_API_KEY", opts) — a single
audited resolution path — and the Http source builds the
authorization: Bearer <token> header itself, from the
first #+ENV_KEYS entry. Credentials have one home, not a scatter
of get_env calls across plugin code.
sequenceDiagram
participant P as the plugin
participant A as Plugin.Auth.fetch
participant R as deploy resolver
participant V as host env
participant H as the request
P->>A: fetch("SHOP_API_KEY", opts)
A->>R: resolver(key)? — the keychain seam
alt resolver returns a value
R-->>A: token
else resolver nil or empty
A->>V: System.get_env(key)
V-->>A: token or nil
end
A-->>P: token (or nil → no header)
P->>H: authorization: Bearer
Follow that ladder of precedence. The plugin asks Auth.fetch
for the key; fetch tries the deploy's resolver first — a
(key → binary | nil) function that is the keychain or broker
seam — and the resolver's value wins. If the resolver returns nil
or an empty string, it falls back to the host environment. If the key is
missing everywhere, fetch returns nil and the
request still goes — bare, with no auth header at all. The test pins this
exactly: it asserts the literal header Bearer sekret when a
token is present. The key never enters plugin code as a string the plugin
fetched, and — the next section's point — the toolkit that ships the
manifest contains no credentials at all.
the second face: a resident MIRROR
Live reads are one face. For agents you usually want the other one. Don't
let the agent reach out — run a daemon that pulls the world in,
renders the rows as plain org headlines on the disk, and let the agent read
the disk like any other workbook. That daemon is
Workbooks.Plugin.Sync: a supervised GenServer whose pure,
testable core is pull_once — it runs
SELECT * FROM <entity> through federation, renders the
rows to org, and writes them into the VFS at
sync/<entity>.org.
The render is mechanical and worth seeing literally. Given
pull_once("shop-order", …) against two rows
{"id":1,"title":"alpha"} and {"id":2,"title":"beta"},
the bytes written are exactly:
#+TITLE: shop-order (synced) #+SYNCED_FROM: shop-order * alpha :task: :PROPERTIES: :ID: 1 :TITLE: alpha :END: * beta :task: :PROPERTIES: :ID: 2 :TITLE: beta :END:
It returns {:ok, 2, org}, and the file reads back through
VFS.get like anything else. The title for each headline is
picked by a cascade — the first present of
title, name, summary, id
— every key goes into the :PROPERTIES: drawer UPCASED, and any
non-scalar value is JSON-encoded. A #+SYNCED_FROM line marks the
provenance. The daemon ticks on a default five-minute interval
(300_000 ms), scheduled with Process.send_after —
and notably it does not pull at boot, so a freshly started daemon
has scheduled its first tick but not yet reached out.
sequenceDiagram participant T as the 5-min timer participant S as Plugin.Sync participant F as Federation.query participant D as the VFS — workspace volume participant Ag as an agent T->>S: :pull S->>F: SELECT * FROM shop-order F-->>S: rows S->>S: render → :task: headlines + UPCASED drawer S->>D: VFS.put sync/shop-order.org Note over Ag,D: the agent only ever reads the file Ag->>D: open sync/shop-order.org D-->>Ag: org text
Read that loop as a one-way conveyor. The timer fires; the daemon runs the
federated query; the rows come back; the daemon renders them to
:task: headlines and puts them into the VFS — by
default into the workspace volume at sync/shop-order.org. The
agent, on the far side, only ever opens that file. It reads org text. It
never appears anywhere near the query, the timer, or the source. The
moduledoc's claim is exact: the mirror is "a resident, queryable local
mirror, airgapped from the agent — it only ever writes to the VFS; the agent
reads the mirror like any other workbook."
why the mirror is the SAFE face
That one-way conveyor is the entire safety argument, and it's structural, not a policy you trust. The agent never touches the API and never sees the credential. The daemon holds the key; the agent holds nothing. Their only contact surface is org headlines on disk — exactly the grants-not-inheritance instinct from the parent's safety section, now extended one step past the file's edge.
flowchart LR
subgraph inside["the workbook + VFS — where the agent lives"]
ag["an agent
even a steerable one"]
file["sync/shop-order.org
:task: headlines on disk"]
ag -- "reads only" --> file
end
subgraph outside["the daemon + external API"]
d["Plugin.Sync
holds the key · supervised daemon"]
api["Linear / Asana / your API"]
d -- "pulls, holds LINEAR_API_KEY" --> api
end
d -- "writes org headlines
the ONLY crossing edge" --> file
style ag fill:#a8d4f0,stroke:#121316
style file fill:#aee5c2,stroke:#121316,stroke-width:2.5px
style d fill:#f3c5a3,stroke:#121316
style api fill:#d9dbd3,stroke:#121316
style inside fill:#fbfaf6,stroke:#121316
style outside fill:#fbfaf6,stroke:#121316
Picture two zones with one bridge. On the left, inside the workbook and
its VFS, the agent reads — and only reads — the synced
:task: file. On the right, outside, the daemon holds the
credential and pulls from the real API. Exactly one edge crosses between
them, and it carries org headlines in one direction. The agent could be
fully steered by a hostile instruction and there is still nothing on its
side of the bridge but text someone else already fetched.
The structure is what makes that hold. The daemon is a supervised
Plugin.Sync GenServer, driven from a :daemon: node in
a sync.org face — not something an agent inside the workbook
spawns or steers. And the direction is one-way by construction: the generic
pull_once only ever reads the source and writes
the VFS — there is no write-back path in this core at all. The toolkit that
wires the faces together holds no creds itself — its three faces (base,
data-source, sync daemon) are one authoring unit, but the credential lives
with the host, never the toolkit.
pick your FACE
depth rung · skippable — choosing between the two faces
A federation toolkit gives you two ways to read past the file, and they trade differently. Line them up honestly:
| freshness | where rows live | who holds the key | agent's contact surface | failure mode | |
|---|---|---|---|---|---|
live readFROM entity | current — fetched per query | nowhere — transient | host (via Auth), at query time | the live API, indirectly | {:error, …} surfaces now |
| resident mirror the sync daemon | as fresh as the last tick (def 5 min) | on disk · sync/<entity>.org | the daemon, never the agent | org text on disk, only | last good mirror stays readable |
| specialized connector (Linear/Asana — planned) | per its own design | either | host | its own query/3 | per impl |
The verdict in a sentence: reach for the live read when freshness per query matters and the caller is trusted code that can handle an error inline; reach for the resident mirror when an agent is the reader, because it moves the credential and the network entirely out of the agent's world and leaves it nothing but text. The specialized connectors — Linear, Asana, a cursor-based TaskSync — are authored shape with planned implementations; their contract is the same one callback, refined.
what FROM does NOT do
Honesty section, because the FROM trick is easy to oversell.
Here is exactly what it does and doesn't push.
No predicate pushdown. The generic HTTP source never reads the SQL.
SELECT id FROM x WHERE y = 1 and SELECT * FROM x do
the identical thing: GET the URL and return every row. The plugin is handed
the raw SQL and could parse it — a specialized connector translating
to GraphQL does — but the shipped generic source ignores WHERE and projection
entirely.
No cross-source joins. A query goes wholly to a plugin or wholly to
the local VFS — the first-FROM-wins regex makes that a hard
fork. You cannot join a foreign entity against a local table in one
statement.
The mirror is a point-in-time full replace. Each pull writes one
org file with VFS.put — an insert-or-replace of the whole file.
The generic pull_once has no cursor and no incremental sync;
cursor-based upsert is the specialized TaskSync design, and it's planned.
Discovery is on-demand, not automatic. As noted above,
discover has no boot call site — registration happens when you
run it, not when the engine starts.
Only one implementation ships. Workbooks.DataSource.Http
is real and tested; the Linear, Asana, and TaskSync modules do not exist in
this repo — and their own manifests say so, admitting the impl is planned.
The manifests are fixtures that load_impl skips today, on
purpose.
And one naming collision to defuse: wbx mirror and
wbx federate in the CLI are about git / Radicle repo
federation — a completely different construct that happens to share the
words. Nothing on this page is that.
questions people actually ASK
Does WHERE filter on the server?
Not with the generic HTTP source. The plugin is handed the raw SQL but
the shipped DataSource.Http ignores it — it GETs the URL and
returns every row, so WHERE and column projection are applied
locally, after the fetch. A specialized connector can translate the SQL
into its own API's query language; the generic one does not.
Can the agent see LINEAR_API_KEY?
No — and not by policy. With the mirror face, the daemon holds the key
and the agent only ever reads org headlines off the disk; the credential
is never on the agent's side of the bridge. Even on the live-read face the
key flows through the single Plugin.Auth path, host-side,
never as a get_env call in plugin code.
What if a manifest names a module that doesn't exist?
It's skipped, silently, by design. Discovery checks that the module
loads and exports query/3; if it doesn't, the source is
treated as not-ready, not as an error. That's what lets a toolkit ship its
declarative manifest before its implementation lands.
Where does the mirror land?
In the VFS workspace volume by default, at
sync/<entity>.org — for example
sync/shop-order.org. You can override the path, but the
default puts it right where any other workbook file lives, so the agent
reads it with no special knowledge.
Is this the wbx federate thing?
No. wbx mirror / wbx federate are git and
Radicle repository federation — sharing the word, nothing else. Foreign
tables are about FROM <entity> resolving to a data
source, live or mirrored.
How fresh is the mirror?
As fresh as its last tick — the default interval is five minutes
(300_000 ms), and the daemon does not pull at boot, so the
very first refresh happens one interval after it starts. Each pull is a
full replace of the one org file, not an incremental update.
keep GOING
This deep-dive picks up exactly where the VFS lesson's honesty section left off — the disk ended at the file's edge, and this is the seam past it.