learn / 09·6 — under vfs · foreign tables

when fromREACHESpast the file

A foreign table is an entity name registered to a data-source plugin — so FROM linear-issue stops meaning a table in the file and starts meaning the outside world. One regex, one map, zero new query engine. And for agents, the safer trick is the opposite direction: pull the world in, mirror it to disk as plain text, and never hand over the key.

foreign tables11 min read
A small figure standing at the edge of a sealed white reading-room, reaching one hand through a single doorway in the wall toward a vast bright machine-city of data towers outside — only the doorway connects the two — 1970s sci-fi style, monumental scale, luminous

the disk ends at the file's EDGE

The parent lesson sold a sealed disk, and it meant it: every path bottoms out inside the file, by construction, and nothing the code or the agents do can reach the machine underneath. That's the whole safety story. It's also, eventually, a wall — because the data you actually care about doesn't live in the file. It lives in Linear, in Asana, in Stripe, in some REST endpoint your team stood up last quarter.

Two bad instincts present themselves, and both undo something. The first: hand the agent the API key and a network socket. That demolishes the sealed-disk story the VFS lesson just spent a page building — a steerable agent with a credential and an open port is an exfiltration machine waiting for the wrong instruction. The second: paste exports into the disk by hand. Honest, safe, and stale the moment you save it.

So the missing piece isn't a bigger disk or a looser sandbox. It's a way for FROM to mean past the file when you want it to — without the query engine learning what HTTP is, and without the agent ever touching the wire. That's this lesson.

the DEFINITION

for·eign ta·ble /ˈfɒr·ən ˈteɪ·bəl/ noun

1. an entity name registered by a data-source plugin, so that FROM <entity> resolves past the file — to a live external source or a resident local mirror — instead of to a table inside the workbook's disk. The OQL read face of a federation toolkit.

It is not a new query engine and not a new primitive. It's a routing decision: an unregistered name falls through to the local VFS exactly as before; a registered name is handed to a plugin. The whole construct is one regex, one :persistent_term map, and a one-callback behaviour — and the rest of this page is what those three pieces buy you.

one FROM, two DESTINATIONS

Here is the core mechanic, and it's smaller than you'd guess. When a query arrives, Federation.query runs a single regex over it — \bFROM\s+([A-Za-z0-9_-]+), the first FROM only — to lift the entity name out. Then it asks the registry one question: is this name registered? If yes, the matching plugin's query/3 is called and its rows come back live. If no, it returns a plain atom, :not_federated, and the caller falls through to the local VFS across every workbook member's SQLite store — the same query you'd have run before any of this existed.

flowchart TD
  sql["a query arrives
SELECT … FROM x"] rx["lift the entity from the first FROM
regex · letters, digits, dash, underscore"] reg{"is x registered
in the source map?"} plug["module.query(x, {sql, config}, opts)
the plugin goes and gets it"] live["live rows
federated: true"] nf["return :not_federated"] vfs["fall through — vfs_query across
every member's SQLite disk"] sql --> rx --> reg reg -- "yes" --> plug --> live reg -- "no" --> nf --> vfs style sql fill:#ffffff,stroke:#121316 style rx fill:#ffffff,stroke:#121316 style reg fill:#f2ddb0,stroke:#121316,stroke-width:2.5px style plug fill:#a8d4f0,stroke:#121316 style live fill:#a8d4f0,stroke:#121316 style nf fill:#f3c5a3,stroke:#121316 style vfs fill:#aee5c2,stroke:#121316

Read that graph as a fork with a fallback. The query walks in; the regex pulls one name off the front of FROM; the registry is a yes-or-no gate. Down the yes branch the plugin does all the real work and the result comes back flagged federated: true. Down the no branch nothing exotic happens at all — the literal atom :not_federated rides back to the caller, which quietly does what it always did: query the local disk. The query engine never learned about HTTP, GraphQL, or any external protocol. It learned to route on a name, and a plugin learned the rest.

That fall-through is deliberately the caller's job, not the router's. Library.query tries federation first; an {:ok, rows} becomes a federated result, an {:error, …} surfaces the error, and :not_federated is the signal to go local. One seam, two destinations, and the boundary between them is a single word in the FROM clause.

the whole contract is one CALLBACK

depth rung · skippable — the behaviour, for the curious

A data source is one function. The Workbooks.DataSource behaviour has exactly one callback — query(entity, q, opts) returning {:ok, [map()]} or {:error, term()}. Today q is %{sql: raw, config: cfg}: the plugin gets the raw SQL string and parses what it needs from it, plus its own per-entity config. The rows it returns are headline-shaped maps — the same shape the rest of the system reads.

The registry lives in :persistent_term — rare write, fast read, the same posture as the command registry. Four functions reach it:

callwhat it doesreturns
register(entity, module, config)bind a name to a plugin module + its configthe entity is now foreign
lookup(entity)find the module for a namethe module, or nil
config(entity)fetch that entity's config mapa map (default %{})
entities()list every registered namethe foreign namespace

The verdict of that table: registration is the entire surface. Bind a name with register, and lookup is what the router calls on every query to decide the fork above. config is how per-entity settings — a URL, a rows path, an env key — reach the plugin as q.config. There is no schema to declare, no migration, no adapter interface beyond the one function. That deliberate smallness is why a generic source can make any JSON endpoint a table without writing code at all — which is the next section.

plugins announce themselves in ORG

Nobody calls register by hand. Plugins announce themselves in the grammar. Federation.discover globs toolkits/*/plugin/manifest.org and reads a small keyword set out of each one: #+SLOT (must be data-source), #+ENTITIES (a whitespace-split list of names this source owns), #+IMPL (the Elixir module that implements query/3), plus generic config like #+URL, #+ROWS_PATH, #+ENV_KEYS. The toolkits root is $WB_TOOLKITS_ROOT if it names a real directory, else the in-tree toolkits/.

flowchart TD
  glob["glob toolkits/*/plugin/manifest.org"]
  parse["parse keywords
#+SLOT · #+ENTITIES · #+IMPL · #+URL · …"] slot{"#+SLOT is
data-source?"} load{"module exists +
exports query/3?"} reg["register each entity → module + config"] skip["skip — silently, not an error"] glob --> parse --> slot slot -- "no" --> skip slot -- "yes" --> load load -- "yes" --> reg load -- "no" --> skip style glob fill:#ffffff,stroke:#121316 style parse fill:#ffffff,stroke:#121316 style slot fill:#f2ddb0,stroke:#121316,stroke-width:2.5px style load fill:#f2ddb0,stroke:#121316,stroke-width:2.5px style reg fill:#a8d4f0,stroke:#121316 style skip fill:#d9dbd3,stroke:#121316

Walk that graph and notice the forgiving shape. Discovery globs the manifests, parses the keywords, then checks two gates: is this slot a data source, and does the named module actually exist and export query/3? Pass both and every listed entity registers. Fail either — most importantly, a manifest that names a module which doesn't exist yet — and the source is skipped, not crashed. That's a designed property: a toolkit can ship its declarative manifest before its implementation lands, and discovery treats the gap as "not ready," not as a fatal error.

One honest caveat: discover has no boot call site today. It's idempotent and invoked on demand — by tests, the console, and a planned provision walk — but registration does not happen automatically when the engine starts. You run discovery; you don't inherit it.

any JSON API in six LINES

The only shipped implementation is also the most useful one: Workbooks.DataSource.Http turns any JSON HTTP endpoint into a table with zero code. You write a manifest, not a module. Here is one a reader could drop in today, at toolkits/orders/plugin/manifest.org:

#+TITLE: orders
#+SLOT: data-source
#+ENTITIES: shop-order
#+IMPL: Workbooks.DataSource.Http
#+URL: https://api.example.com/v1/orders
#+ROWS_PATH: data.items
#+ENV_KEYS: SHOP_API_KEY

Run Federation.discover() and shop-order is now a foreign table. Then wbx library query "SELECT * FROM shop-order" returns %{federated: true, rows: [%{member: "federation", rows: […]}], skipped: []} — while the same query against FROM workspace_files falls straight through to the local disk. The #+ROWS_PATH is an optional dotted path that digs into nested JSON: data.items reaches response["data"]["items"]; omit it and the response body is itself taken as the array. Non-map elements are dropped; what survives is your rows.

sequenceDiagram
  participant Q as a query
  participant H as DataSource.Http
  participant E as the host — :httpc
  participant A as api.example.com
  Q->>H: query("shop-order", {sql, config}, opts)
  H->>E: GET the #+URL — host egress, 10s timeout
  E->>A: HTTP request
  A-->>E: 200 · JSON body
  E-->>H: status 200..299 → body
  H->>H: Jason.decode! → dig #+ROWS_PATH → drop non-maps
  H-->>Q: {:ok, rows}
  

Read that exchange as a relay where the plugin never opens a socket. The query calls the Http source; the source asks the host to make the GET — the comment in the code is exact, "the plugin never opens a socket itself" — and the host's :httpc does the egress with a ten-second timeout. A status in 200..299 returns the body; the source decodes it, digs the dotted rows path, drops anything that isn't a map, and hands back the rows. And when it fails, it fails honestly: a 500 comes back as {:error, {:http_status, 500}}, propagated untouched all the way through Federation.query. No URL at all is {:error, :no_url}. Linear and Asana, when they land, are specializations of exactly this pattern — their own query/3 translating OQL into their GraphQL.

one audited path for the KEY

That manifest named #+ENV_KEYS: SHOP_API_KEY, and the way that key reaches the request is the whole trust story. The plugin never calls System.get_env. It calls Workbooks.Plugin.Auth.fetch("SHOP_API_KEY", opts) — a single audited resolution path — and the Http source builds the authorization: Bearer <token> header itself, from the first #+ENV_KEYS entry. Credentials have one home, not a scatter of get_env calls across plugin code.

sequenceDiagram
  participant P as the plugin
  participant A as Plugin.Auth.fetch
  participant R as deploy resolver
  participant V as host env
  participant H as the request
  P->>A: fetch("SHOP_API_KEY", opts)
  A->>R: resolver(key)?  — the keychain seam
  alt resolver returns a value
    R-->>A: token
  else resolver nil or empty
    A->>V: System.get_env(key)
    V-->>A: token or nil
  end
  A-->>P: token (or nil → no header)
  P->>H: authorization: Bearer 
  

Follow that ladder of precedence. The plugin asks Auth.fetch for the key; fetch tries the deploy's resolver first — a (key → binary | nil) function that is the keychain or broker seam — and the resolver's value wins. If the resolver returns nil or an empty string, it falls back to the host environment. If the key is missing everywhere, fetch returns nil and the request still goes — bare, with no auth header at all. The test pins this exactly: it asserts the literal header Bearer sekret when a token is present. The key never enters plugin code as a string the plugin fetched, and — the next section's point — the toolkit that ships the manifest contains no credentials at all.

the second face: a resident MIRROR

Live reads are one face. For agents you usually want the other one. Don't let the agent reach out — run a daemon that pulls the world in, renders the rows as plain org headlines on the disk, and let the agent read the disk like any other workbook. That daemon is Workbooks.Plugin.Sync: a supervised GenServer whose pure, testable core is pull_once — it runs SELECT * FROM <entity> through federation, renders the rows to org, and writes them into the VFS at sync/<entity>.org.

The render is mechanical and worth seeing literally. Given pull_once("shop-order", …) against two rows {"id":1,"title":"alpha"} and {"id":2,"title":"beta"}, the bytes written are exactly:

#+TITLE: shop-order (synced)
#+SYNCED_FROM: shop-order

* alpha :task:
:PROPERTIES:
:ID: 1
:TITLE: alpha
:END:

* beta :task:
:PROPERTIES:
:ID: 2
:TITLE: beta
:END:

It returns {:ok, 2, org}, and the file reads back through VFS.get like anything else. The title for each headline is picked by a cascade — the first present of title, name, summary, id — every key goes into the :PROPERTIES: drawer UPCASED, and any non-scalar value is JSON-encoded. A #+SYNCED_FROM line marks the provenance. The daemon ticks on a default five-minute interval (300_000 ms), scheduled with Process.send_after — and notably it does not pull at boot, so a freshly started daemon has scheduled its first tick but not yet reached out.

sequenceDiagram
  participant T as the 5-min timer
  participant S as Plugin.Sync
  participant F as Federation.query
  participant D as the VFS — workspace volume
  participant Ag as an agent
  T->>S: :pull
  S->>F: SELECT * FROM shop-order
  F-->>S: rows
  S->>S: render → :task: headlines + UPCASED drawer
  S->>D: VFS.put sync/shop-order.org
  Note over Ag,D: the agent only ever reads the file
  Ag->>D: open sync/shop-order.org
  D-->>Ag: org text
  

Read that loop as a one-way conveyor. The timer fires; the daemon runs the federated query; the rows come back; the daemon renders them to :task: headlines and puts them into the VFS — by default into the workspace volume at sync/shop-order.org. The agent, on the far side, only ever opens that file. It reads org text. It never appears anywhere near the query, the timer, or the source. The moduledoc's claim is exact: the mirror is "a resident, queryable local mirror, airgapped from the agent — it only ever writes to the VFS; the agent reads the mirror like any other workbook."

why the mirror is the SAFE face

That one-way conveyor is the entire safety argument, and it's structural, not a policy you trust. The agent never touches the API and never sees the credential. The daemon holds the key; the agent holds nothing. Their only contact surface is org headlines on disk — exactly the grants-not-inheritance instinct from the parent's safety section, now extended one step past the file's edge.

flowchart LR
  subgraph inside["the workbook + VFS — where the agent lives"]
    ag["an agent
even a steerable one"] file["sync/shop-order.org
:task: headlines on disk"] ag -- "reads only" --> file end subgraph outside["the daemon + external API"] d["Plugin.Sync
holds the key · supervised daemon"] api["Linear / Asana / your API"] d -- "pulls, holds LINEAR_API_KEY" --> api end d -- "writes org headlines
the ONLY crossing edge" --> file style ag fill:#a8d4f0,stroke:#121316 style file fill:#aee5c2,stroke:#121316,stroke-width:2.5px style d fill:#f3c5a3,stroke:#121316 style api fill:#d9dbd3,stroke:#121316 style inside fill:#fbfaf6,stroke:#121316 style outside fill:#fbfaf6,stroke:#121316

Picture two zones with one bridge. On the left, inside the workbook and its VFS, the agent reads — and only reads — the synced :task: file. On the right, outside, the daemon holds the credential and pulls from the real API. Exactly one edge crosses between them, and it carries org headlines in one direction. The agent could be fully steered by a hostile instruction and there is still nothing on its side of the bridge but text someone else already fetched.

The structure is what makes that hold. The daemon is a supervised Plugin.Sync GenServer, driven from a :daemon: node in a sync.org face — not something an agent inside the workbook spawns or steers. And the direction is one-way by construction: the generic pull_once only ever reads the source and writes the VFS — there is no write-back path in this core at all. The toolkit that wires the faces together holds no creds itself — its three faces (base, data-source, sync daemon) are one authoring unit, but the credential lives with the host, never the toolkit.

pick your FACE

depth rung · skippable — choosing between the two faces

A federation toolkit gives you two ways to read past the file, and they trade differently. Line them up honestly:

freshnesswhere rows livewho holds the keyagent's contact surfacefailure mode
live read
FROM entity
current — fetched per querynowhere — transienthost (via Auth), at query timethe live API, indirectly{:error, …} surfaces now
resident mirror
the sync daemon
as fresh as the last tick (def 5 min)on disk · sync/<entity>.orgthe daemon, never the agentorg text on disk, onlylast good mirror stays readable
specialized connector
(Linear/Asana — planned)
per its own designeitherhostits own query/3per impl

The verdict in a sentence: reach for the live read when freshness per query matters and the caller is trusted code that can handle an error inline; reach for the resident mirror when an agent is the reader, because it moves the credential and the network entirely out of the agent's world and leaves it nothing but text. The specialized connectors — Linear, Asana, a cursor-based TaskSync — are authored shape with planned implementations; their contract is the same one callback, refined.

what FROM does NOT do

Honesty section, because the FROM trick is easy to oversell. Here is exactly what it does and doesn't push.

No predicate pushdown. The generic HTTP source never reads the SQL. SELECT id FROM x WHERE y = 1 and SELECT * FROM x do the identical thing: GET the URL and return every row. The plugin is handed the raw SQL and could parse it — a specialized connector translating to GraphQL does — but the shipped generic source ignores WHERE and projection entirely.

No cross-source joins. A query goes wholly to a plugin or wholly to the local VFS — the first-FROM-wins regex makes that a hard fork. You cannot join a foreign entity against a local table in one statement.

The mirror is a point-in-time full replace. Each pull writes one org file with VFS.put — an insert-or-replace of the whole file. The generic pull_once has no cursor and no incremental sync; cursor-based upsert is the specialized TaskSync design, and it's planned.

Discovery is on-demand, not automatic. As noted above, discover has no boot call site — registration happens when you run it, not when the engine starts.

Only one implementation ships. Workbooks.DataSource.Http is real and tested; the Linear, Asana, and TaskSync modules do not exist in this repo — and their own manifests say so, admitting the impl is planned. The manifests are fixtures that load_impl skips today, on purpose.

And one naming collision to defuse: wbx mirror and wbx federate in the CLI are about git / Radicle repo federation — a completely different construct that happens to share the words. Nothing on this page is that.

questions people actually ASK

Does WHERE filter on the server?

Not with the generic HTTP source. The plugin is handed the raw SQL but the shipped DataSource.Http ignores it — it GETs the URL and returns every row, so WHERE and column projection are applied locally, after the fetch. A specialized connector can translate the SQL into its own API's query language; the generic one does not.

Can the agent see LINEAR_API_KEY?

No — and not by policy. With the mirror face, the daemon holds the key and the agent only ever reads org headlines off the disk; the credential is never on the agent's side of the bridge. Even on the live-read face the key flows through the single Plugin.Auth path, host-side, never as a get_env call in plugin code.

What if a manifest names a module that doesn't exist?

It's skipped, silently, by design. Discovery checks that the module loads and exports query/3; if it doesn't, the source is treated as not-ready, not as an error. That's what lets a toolkit ship its declarative manifest before its implementation lands.

Where does the mirror land?

In the VFS workspace volume by default, at sync/<entity>.org — for example sync/shop-order.org. You can override the path, but the default puts it right where any other workbook file lives, so the agent reads it with no special knowledge.

Is this the wbx federate thing?

No. wbx mirror / wbx federate are git and Radicle repository federation — sharing the word, nothing else. Foreign tables are about FROM <entity> resolving to a data source, live or mirrored.

How fresh is the mirror?

As fresh as its last tick — the default interval is five minutes (300_000 ms), and the daemon does not pull at boot, so the very first refresh happens one interval after it starts. Each pull is a full replace of the one org file, not an incremental update.

keep GOING

This deep-dive picks up exactly where the VFS lesson's honesty section left off — the disk ended at the file's edge, and this is the seam past it.