learn / 09·8 — under vfs · backends

where theBYTESactually live

A backend is a storage provider behind one of the runtime's two seams — blob or structured — chosen entirely by config, never a code fork. The same compiled runtime serves a hobbyist with a Fly volume and a company on R2 plus Postgres; the difference is a few lines of env. And because isolation lives above the backend, swapping one can't widen who sees what.

backends11 min read
A small engineer standing before a vast bright hall of identical interchangeable storage vaults, each a different size — a tiny shelf, a city of shelves, a glowing remote silo — all wired to one humming console in the foreground, monumental scale against the lone figure, 1970s sci-fi style

the question every engine INHERITS

The VFS lesson ended with a clean handoff: put the catalog in the file and the warehouse behind the Nexus. But the moment you stand an engine up, you've inherited the oldest question in infrastructure — where do the bytes actually live, and what happens when you outgrow your first answer?

Every platform makes you choose a storage stack on day one and then punishes you for changing your mind. The punishment has a shape: a migration, a fork in the code, a vendor's SDK threaded through every call site so that "switch from a disk to a bucket" means editing the program, not the config. And the scarier version arrives the day you go multi-tenant — does moving to a shared bucket quietly widen who can read what?

This page is about a runtime that refused to answer the question. The code never chose a backend, so you can choose any of them. The same image that a weekend project runs on a single volume is the image a company runs on object storage plus a managed database — and the difference between those two deployments is a handful of environment variables, not a single line of source.

the DEFINITION

back·end /ˈbak·ɛnd/ noun

1. a storage provider sitting behind one of the runtime's two seams — the blob seam for large opaque files, or the structured seam for small queryable rows — selected entirely by config. The runtime code is identical across every deployment; only the env changes.

The load-bearing distinction is those two data classes. Blobs are large, opaque, content-addressed — tenant git repos, .wbundles, signed artifacts, VFS files, sealed ledgers — and they go to an object store or a volume through Workbooks.Storage. Structured data is small, relational, queried — vars, agent memory, telemetry, the command and package registries, tenant metadata — and it goes to Postgres or SQLite through Workbooks.DB. Keeping them apart is the point: you can put blobs on R2 and structured data on Railway, or both on a Fly volume, and the runtime doesn't notice.

one screen of CONFIG

The entire storage and identity posture of a deployment is one file — storage.env.example — and its header states the contract in one sentence: the runtime code is identical across every deployment, only this config changes. You set these as platform secrets; they are never baked into the image. Here is the real surface, abridged, with what each knob flips:

# ── identity ──────────────────────────────────────────
WB_SIGNING_KEY=…        → the Ed25519 seed behind the DID — must survive redeploys
WB_PRIMARY_TENANT=dev   → the default tenant scope

# ── blobs (the Workbooks.Storage seam) ────────────────
WB_STORAGE=local        → local | s3 | r2 — one word picks the adapter
WB_DATA=/data           → where local blobs + sqlite + models live
WB_S3_ENDPOINT=…        → s3.us-east-1.amazonaws.com OR <acct>.r2.cloudflarestorage.com
WB_S3_BUCKET=…  WB_S3_KEY=…  WB_S3_SECRET=…
WB_S3_REGION=auto       → us-east-1 for AWS, auto for R2

# ── structured (the Workbooks.DB seam) ────────────────
WB_DATABASE_URL=…       → set = Postgres + pgvector; unset = SQLite

# ── embeddings (a SEPARATE knob from where vectors live) ─
WB_EMBED=local          → hash | local | openrouter — how text becomes a vector
WB_EMBED_MODEL=minishlab/potion-base-8M

Read that file top to bottom and you've read the whole storage design. One word in WB_STORAGE swaps the blob backend. The presence or absence of WB_DATABASE_URL swaps the structured one. And WB_EMBED is deliberately a different knob — it decides how text becomes a vector, not where the vectors live. Two orthogonal axes, no hidden third one.

two seams, four VERBS

Underneath the config are two tiny interfaces, and their smallness is the whole trick. The blob seam, Workbooks.Storage, is a behaviour with exactly four callbacks — put, get, list, delete — and every one of them takes tenant as its first argument. The structured seam, Workbooks.DB, is a single handle you open and run statements against. That's the entire surface area a backend has to satisfy.

Adapter selection is not a framework, a registry, or a plugin system. It is a five-line case statement on one env var, and it is small enough to quote in full:

case System.get_env("WB_STORAGE") do
  "s3" -> Workbooks.Storage.S3
  "r2" -> Workbooks.Storage.S3      → note: r2 and s3 are the SAME module
  _    -> Workbooks.Storage.Local
end

Adding a provider is one module plus one config line — never a runtime fork. It's the same pattern as the Browse provider slot elsewhere in the system. The flowchart is honest about how little switching there is: code hits a seam, the seam hits a case, the case names a module, and the bytes land wherever that module puts them.

flowchart LR
  code["runtime code
store a bundle · query a var"] subgraph seams["two seams — the only interfaces code knows"] direction TB st["Workbooks.Storage
put · get · list · delete"] db["Workbooks.DB
open · execute · query"] end case1{"WB_STORAGE"} case2{"WB_DATABASE_URL?"} local["Local — files under WB_DATA"] s3["S3 / R2 — one module"] sqlite["SQLite — a file"] pg["Postgres — a URL"] code --> st --> case1 code --> db --> case2 case1 -- "local" --> local case1 -- "s3 / r2" --> s3 case2 -- "unset" --> sqlite case2 -- "set" --> pg style seams fill:#fbfaf6,stroke:#121316 style st fill:#a8d4f0,stroke:#121316 style db fill:#aee5c2,stroke:#121316 style code fill:#ffffff,stroke:#121316 style local fill:#f2ddb0,stroke:#121316 style s3 fill:#f2ddb0,stroke:#121316 style sqlite fill:#f3c5a3,stroke:#121316 style pg fill:#f3c5a3,stroke:#121316

isolation lives ABOVE the backend

Here is the claim that makes swapping backends safe rather than terrifying. Tenant is the first argument of every storage call and every DB call, by construction. Isolation lives above the backend — there is no code path in which the backend decides who sees what. Swapping a Fly volume for R2 cannot widen access, because access was never the backend's job. Backends are interchangeable precisely because they were never trusted with security.

Auth decides who: a JWT, verified via JWKS, scoped to an organizationId that is the tenant. Storage decides nothing — it just stores under whatever scope it was handed. And one small function, safe_key/1, strips the empty string, ., .., and stray slashes from every key path, so a hostile key can never climb out of its tenant prefix on a filesystem backend.

flowchart TD
  jwt["a request — JWT verified via JWKS"]
  scope["tenant = organizationId
the first argument of every call"] seam["the seam — put/get/list/delete(tenant, …)
safe_key strips '..' before any adapter runs"] local["Local — /data/<tenant>/blobs/<key>"] s3["S3 / R2 — <tenant>/blobs/<key>"] bad["backend decides access"] jwt --> scope --> seam seam --> local seam --> s3 seam -. "no such code path" .-x bad style scope fill:#a8d4f0,stroke:#121316,stroke-width:2.5px style seam fill:#fbfaf6,stroke:#121316,stroke-width:2px style local fill:#f2ddb0,stroke:#121316 style s3 fill:#f2ddb0,stroke:#121316 style bad fill:#d9dbd3,stroke:#121316,stroke-dasharray:4 4 style jwt fill:#ffffff,stroke:#121316

The blob seam itself is enforced-by-construction rather than proven by a cross-tenant test: its one test is a single-tenant store → fetch → install round-trip through Storage.put/Library.fetch, and there is no end-to-end cross-tenant-denial or ..-escape test on the Local/S3 adapters today. The denial holds because tenant is the first argument of every call and safe_key/1 runs before any adapter — but that is the shape of the code, not a green test, and the honesty section says so.

A separate seam does carry a named cross-tenant test. The brokered KV store — Workbooks.StorageBroker, a guest-facing key/value table on SQLite, distinct from this blob seam — has a case the suite names in capitals: TENANT ISOLATION — a tenant cannot read or overwrite another's keys. Alice and Bob store the same key name with different values; the scoped listing keeps them apart; a cross-tenant read is denied. The same file tests durability across a close-and-reopen and per-tenant quotas. That proves the broker's isolation, not the blob adapters' — the two share the tenant-first shape, but only one is covered end-to-end.

one adapter, every BUCKET

depth rung · skippable — how the blob adapters actually work

The Local adapter is about forty-five lines. Blobs live at <WB_DATA>/<tenant>/blobs/<key>, and the implementation is plain File.write, File.read, File.rm, and a Path.wildcard for listing. The simplest durable deploy on earth is this adapter with WB_DATA mounted on a Fly volume: persistent across redeploys, zero code change — the volume is the storage.

The S3 adapter is the surprising one, because it's only one module and it serves both AWS S3 and Cloudflare R2. R2 is S3-compatible, so the difference is purely config — a different endpoint, a different region. The SigV4 signing is hand-rolled on Erlang's :crypto and :httpc, which means the whole thing ships with no new dependency; the canonical-request and signing-key chain is right there in the source, and the signing function is deliberately left public so a known AWS test vector could verify it — the hook is there, though no such vector test is in the suite yet. Tenant isolation is the identical key prefix — <tenant>/blobs/<key> — enforced above the backend, exactly as on the Local side.

endpointregionwhere <tenant>/blobs/<key> lands
Local— (filesystem)<WB_DATA>/<tenant>/blobs/<key> on disk
S3s3.us-east-1.amazonaws.comus-east-1an object key in the bucket
R2<acct>.r2.cloudflarestorage.comautoan object key in the bucket — same module as S3

The verdict of that table is the one-module fact: Local writes a file, S3 and R2 write an object, and the only thing distinguishing the two cloud rows is two lines of config. The same <tenant>/blobs/<key> shape names the file on disk and the object in the bucket, so the scope you reason about never changes when the destination does.

Postgres is just a URL

The structured seam flips on one question: Workbooks.DB.backend/0 returns :postgres when WB_DATABASE_URL is set and non-empty, and :sqlite otherwise — where SQLite files live at <WB_DATA>/_db/<name>.sqlite. The reason any Postgres provider works is that it's just a connection URL: CrunchyData, Fly PG, Railway, Neon, Supabase are indistinguishable to the runtime. One code path, no per-provider branches.

The entire dialect bridge between the two backends is one regex. Stores write portable SQL with ?1 and ?2 placeholders; for Postgres, the pg/1 helper rewrites them to $1 and $2 — same numbers, different sigil. Rows come back as lists either way, so the store query code is byte-identical regardless of backend. SSL defaults on for any non-localhost host, with ?sslmode=disable in the URL as the opt-out.

sequenceDiagram
  participant S as a store
  participant D as Workbooks.DB
  participant Q as the backend
  S->>D: query with ?1 / ?2 placeholders
  alt WB_DATABASE_URL set
    D->>Q: pg/1 rewrites ?1 → $1, opens Postgres (SSL on)
  else unset
    D->>Q: runs as-is against SQLite under WB_DATA/_db
  end
  Q-->>S: rows — as lists, either way
  

Read that exchange as one promise: the store never learns which backend answered. It hands down ?1 placeholders and gets back a list of rows; in between, the seam either translated the sigils and dialed a remote Postgres over SSL, or ran the statement untouched against a local SQLite file. The caller's code is the same in both branches — which is exactly why moving from a file to Neon is a config change, not a rewrite.

the flip you get for FREE

Setting WB_DATABASE_URL doesn't only move tables — it flips semantic search from brute-force to ANN, and you get that upgrade for free. On SQLite, Workbooks.Vector stores vectors as JSON text and computes cosine similarity in Elixir, scanning every row — O(n) per query. This is the live-tested default, and it is genuinely fine for one library's worth of vectors. On Postgres, the engine runs CREATE EXTENSION IF NOT EXISTS vector on first open, stores a real vector column, and ranks in the database with pgvector's <=> cosine-distance operator and an ORDER BY … LIMIT k. The interface is the same either way, so callers never branch.

What's elegant is that the linear cost is made visible rather than hidden. Once a SQLite brute-force search crosses @scan_warn — twenty-five thousand vectors — the engine logs a one-time warning telling the operator exactly what to do:

[warning] Vector: brute-force search over 31204 vectors on SQLite (O(n) per query).
For sub-linear ANN at scale, set WB_DATABASE_URL → pgvector. See docs/VECTOR-QUERY.org.

A work pool would only buy a constant factor — the number of cores — so the real fix at scale is sub-linear ANN, and the warning says so. Then the flip itself is anticlimactic: the same Vector.search(tenant, query, k: 5) call, but on Postgres it becomes … ORDER BY vec <=> $2::vector LIMIT 5, with the score read as 1 - (vec <=> query). The operator's entire migration was one URL.

flowchart TD
  q["Vector.search(tenant, query_vec, k: 5)"]
  br{"pg?()"}
  sq["SQLite — load every row
cosine in Elixir, O(n)
warns past 25,000 vectors"] pg["Postgres — ORDER BY vec <=> query::vector LIMIT k
ranked in-DB, sub-linear"] q --> br br -- "unset" --> sq br -- "set" --> pg style q fill:#a8d4f0,stroke:#121316 style sq fill:#f3c5a3,stroke:#121316 style pg fill:#13d943,stroke:#121316,stroke-width:2.5px style br fill:#fbfaf6,stroke:#121316

One last separation worth nailing down: where the vectors live is this knob; how text becomes a vector is the separate WB_EMBED knob — hash for a zero-dependency lexical embedding, local for pure-Elixir static embeddings, or openrouter for a hosted model. The two axes are orthogonal, and the embedding side is the vectors deep-dive's subject, not this page's.

three real POSTURES

Abstraction earns its keep when it collapses to a few concrete choices. There are three realistic postures, and each one is just a few literal env lines — the same runtime image in all three:

# posture 1 — single box (the defaults; mount a Fly volume at /data and you're durable)
WB_STORAGE=local
WB_DATA=/data

# posture 2 — blobs on Cloudflare R2 (same adapter as S3; two lines of difference)
WB_STORAGE=r2
WB_S3_ENDPOINT=https://<account>.r2.cloudflarestorage.com
WB_S3_BUCKET=acme-workbooks
WB_S3_KEY=…   WB_S3_SECRET=…   WB_S3_REGION=auto

# posture 3 — add ANY Postgres; vector search flips to pgvector ANN as a side effect
WB_DATABASE_URL=postgres://user:[email protected]/wb?sslmode=require
posturewhat's setwhat you getwhen to move on
single boxlocal + a volumedurable across redeploys, zero depswhen blobs need off-box durability
durable blobsr2 + S3 credsblobs on object storage, still SQLite for rowswhen the vector warning fires at 25k
scaleadd WB_DATABASE_URLPostgres rows + pgvector ANN — for freeyou're at the top of the ladder

The honest read of that table: most single-box deploys never leave posture one, and shouldn't. SQLite on a volume is the right answer for the great majority of deployments. You climb the ladder when a specific need shows up — off-box blob durability, then sub-linear vector search — and each rung is additive config, not a migration of the runtime.

the knob that isn't STORAGE

depth rung · skippable — the one env var that's about identity, not bytes

One knob in that file isn't storage at all. WB_SIGNING_KEY is a thirty-two-byte Ed25519 seed, base64-encoded, and it's the seed behind the deployment's DID. Without it, the DID regenerates on every deploy — and the moment the identity changes, every signature made under the old one stops verifying, and sealed ledgers no longer check out. So it has to survive redeploys.

Generating it is a one-liner — elixir -e 'IO.puts(Base.encode64(:crypto.strong_rand_bytes(32)))' — and like every value in this file, it lives in the platform's secret store, set with something like fly secrets set, never baked into the image. That's the same rule the whole file follows: the image is byte-identical everywhere, and the things that differ between deployments are secrets, not source.

where the seam ENDS today

Honesty section. The structured seam is migrating incrementally, and the code says so. Today only the vectors store opens through Workbooks.DB — it's the sole call site. The other structured stores — vars, library, lifecycle, telemetry, the registries — still ride SQLite directly until each is migrated onto the seam. The deploy doc's own status block is candid: the seam, both blob adapters, the BYO-Postgres path, and signing-key persistence are built and tested; what remains is migrating each structured store onto Workbooks.DB incrementally, plus a live S3/R2 and live Postgres round-trip once operator credentials exist.

So calibrate accordingly. The blob side — Local and S3/R2 with hand-rolled SigV4 — has a single-tenant store-fetch-install round-trip test; its cross-tenant denial and the SigV4 signing are enforced by construction and left verifiable (the signing fn is public for an AWS test vector), but neither has a dedicated test in the suite yet. The named cross-tenant test belongs to the separate brokered KV store, not these blob adapters. The SQLite brute-force vector path is the live-tested default. The pgvector path is shape-tested with the live round-trip documented and pending real creds. And two hardening items are explicitly noted but not yet built: per-tenant at-rest blob encryption with a tenant data key, and a microVM boundary for multi-tenant compute isolation.

The anti-hype, said plainly: this page is not an argument to run Postgres. It's an argument that picking the default doesn't lock you in. SQLite plus a volume is genuinely the right answer for most single-box deploys, and the runtime will tell you — at twenty-five thousand vectors — the one moment it isn't.

questions people actually ASK

Do I need Postgres?

No — and the runtime tells you when you might. SQLite on a mounted volume is the default and is fine for most single-box deploys. The one signal to flip WB_DATABASE_URL is the logged warning when a brute-force vector search crosses twenty-five thousand vectors; setting the URL there hands semantic search to pgvector's in-database ANN. Until that moment, you don't need it.

Is R2 different from S3?

Not to the runtime. R2 is S3-compatible, so a single adapter module serves both — in the selection case, r2 resolves to the exact same module as s3. The difference is two config lines: the endpoint (<acct>.r2.cloudflarestorage.com versus s3.us-east-1.amazonaws.com) and the region (auto versus us-east-1).

Can a tenant read another tenant's bucket prefix?

No — by construction. The seam scopes every call above the backend — tenant is the first argument of put, get, list, and delete, and safe_key strips any .. before an adapter runs. The blob adapters have no dedicated cross-tenant test yet; the named test where Alice cannot read Bob's key lives on a separate seam, the brokered KV store (StorageBroker). Swapping the backend can't widen the blob scope either way, because the backend was never the thing deciding access.

Does swapping backends migrate my data?

No — the seam swaps the destination, not the contents. Change WB_STORAGE or set WB_DATABASE_URL and new writes go to the new backend; moving existing bytes over is a separate operation you run. The example file says to set these as platform secrets; it doesn't promise automated data migration, so don't assume one.

Where does the embedding model live?

When WB_EMBED=local, the static embedding matrix — Model2Vec, pure Elixir, no native code — is about thirty megabytes, downloaded once to <WB_DATA>/_models/<id>/ and reused thereafter. That's the embedder knob, separate from where the vectors it produces are stored — see the vectors lesson for the embedding side in full.

What about keeping the live disk in sync?

That's Litestream, and it uses the same s3:// trick: it replicates a WAL-mode SQLite VFS continuously off-box, with a replica URL that's a local file:// path in development and an s3:// bucket — on R2 — in production. Same command, the only difference is the URL. The sync lesson is its home.

keep GOING

This page gave the warehouse a floor. Its neighbors tell you what sits on it.