the RENAME test
Take a workbook — say nike.html, a brand book — and rename it to
final-v2 (3).html. Now rename it again to untitled with no
extension at all. What broke? Nothing. The desktop app still recognises it. The
installer still trusts it. The social card still builds. That refusal to break is the
whole lesson, and the reason is worth more than the trick.
Every other format on your machine knows itself by its name or its first
few bytes. A Word file is .docx; a PDF starts with the literal characters
%PDF-; an executable opens with the ELF magic number. The name or the
magic bytes are the identity, and rename the file wrong and tools get confused.
A workbook can't play that game. Its entire premise is "just an HTML file" — it opens in any browser, it's served by any web host, it survives email and chat and a USB stick. An extension would be a lie the moment someone dropped it, and magic bytes would mean it stopped being plain HTML. So identity had to go somewhere a rename can't reach: inside the file.
the DEFINITION
1. the inline, machine-readable self-description a
workbook carries in a typed <script> tag with the id
workbook-spec — its name, kind, and shape, written where the browser
won't run it. The marker, not the filename, is the format.
This is the canonical reader, said plainly in the runtime itself —
Workbooks.Workbook, the module that defines the format:
# A Workbook is one HTML file with an inline <script id="workbook-spec"> # (the self-description) and the Org source it tangles from. # Format is identified by the inline marker, not the filename.
It is the third reading of the workbook anatomy: the interface is the HTML you see, the code and data ride underneath — and the spec is the small block that lets the whole thing announce what it is to anything that opens it.
anatomy of the MARKER
The marker is three things in one tag: an id that anything can find, a type that keeps the browser from executing it, and a payload that is the record. Here it is, annotated:
The reason it's a <script> and not a <meta> or
an HTML comment is exactly that combination: a multiline structured payload that's
typed, inert, and addressable by id. A <meta> holds a single short
string; a comment is invisible to the DOM. A script with a non-executable type is the
one place HTML lets you stash a whole JSON document — or a whole Org record — that the
page treats as data and the browser leaves untouched.
So the test any reader runs is simply: does this file carry the marker?
flowchart LR f["an .html file
any name, any extension"] q{"carries
script id workbook-spec?"} w["a workbook
readable · catalogable · installable"] p["just a web page
opens and runs — but unrecognised"] f --> q q -- "yes" --> w q -- "no" --> p style f fill:#ffffff,stroke:#121316 style q fill:#fbfaf6,stroke:#121316 style w fill:#a8d4f0,stroke:#121316 style p fill:#f3c5a3,stroke:#121316
24 lines, two LANGUAGES
The format test is small enough to print in full. This is the entire parser — the whole of what makes something a workbook, in the runtime's own words:
def spec(html) when is_binary(html) do
case Regex.run(~r/<script id="workbook-spec"[^>]*>(.*?)<\/script>/s, html) do
[_, json] -> Jason.decode(json)
_ -> {:error, :no_spec}
end
end
That's it. One regex to find the marker, one Jason.decode to read the
payload. On success you get {:ok, %{"toolkit" => "rev"}}; with no marker
you get {:error, :no_spec}. The whole module is 24 lines, and the format
test is two regexes — the second one finds the embedded source block we'll meet later.
The proof the contract is real and not Elixir-specific: the desktop app ports it
byte-for-byte into Rust. Same regex, compiled once and cached — the comment notes the
(?s) dotall flag exists because the spec is pretty-printed JSON:
(?s)<script id="workbook-spec"[^>]*>(.*?)</script>
The Rust side calls it read_spec, returns Option<Value>
— None when the marker is missing or the JSON won't parse — and
exposes it to the desktop UI as a command. Two languages, one contract; the format
survives the port because it was never more than find the marker, read the payload.
Laid side by side:
| runtime (Elixir) | desktop (Rust / Tauri) | |
|---|---|---|
| finds the marker | Workbooks.Workbook.spec/1 | read_spec(path) |
| exposed to callers as | a module function | the workbook_spec_read command |
| returns | {:ok, map} / {:error, :no_spec} | Some(Value) / None |
| when payload won't parse | passes the decode error through | None — missing and malformed collapse |
One honest note the table can't carry: those Elixir functions currently have no callers inside the runtime — the live consumer today is the desktop port and the test suite. The contract is defined and proven; the runtime just hasn't wired its own reader in yet. We'd rather print that than imply more reach than exists.
how a folder becomes a LIBRARY
This is a depth rung — skip it and the next section still lands. But it's where the marker stops being a definition and starts being a behaviour. The desktop app has to turn a folder of files into a library you can browse, and is this a workbook? is the only question it asks. Here's the walk:
flowchart LR dirs["package folders
on disk"] walk["walk for html / htm
cheap extension pre-filter"] scan["par_iter to read_spec
parallel marker scan, rayon"] grid["library entries
path and title"] dirs --> walk --> scan --> grid style dirs fill:#ffffff,stroke:#121316 style walk fill:#f2ddb0,stroke:#121316 style scan fill:#aee5c2,stroke:#121316 style grid fill:#a8d4f0,stroke:#121316
Walk every package folder for .html and .htm files;
parallel-scan each one for the spec marker; only the files that pass
read_spec become library entries. The title comes from the spec's
title field, falling back to the filename when the spec doesn't name one.
So in the running app, "is a workbook" is not a guess and not a list — it is literally
"carries a parseable workbook-spec."
The extension filter is worth a second look, because it's the one place a filename
matters at all. It's not identity — it's economy. Reading a thousand .png
files looking for an HTML marker would be wasteful, so the scan pre-filters on
extension before it ever opens a file. (The desktop's source-resolver does the same: it
rejects non-.html extensions before canonicalising a path.) The extension
narrows the haystack; the marker is still the needle. The kanban mock workbook says so
in its own comment: the spec marker is what the desktop's grid scan and provenance
verifier key on to recognise a workbook.
one marker, two DIALECTS
Same id, two payload languages — both real, both in the wild. The split is honest and worth naming: JSON is the machine contract; text/org is the publishable record.
| JSON spec | Org-text spec | |
|---|---|---|
| type attribute | application/json | text/org |
| payload | a JSON object | an Org headline + PROPERTIES drawer |
| producers | brand books, the kanban mock, toolkits | every live learn page (this one included) |
| read by | the runtime + desktop JSON readers | the learn site's own JS + OG builder |
| the JSON reader on it | decodes cleanly | returns a decode error — wrong dialect |
The JSON dialect is the one the runtime and desktop readers speak: a kanban board's
spec is {"name":"Kanban Board","version":"0.1.0","kind":"app","columns":[…]};
a toolkit's is {"toolkit":"rev"}; a brand book's carries
slug, title, createdAt and a clutch of capture
metadata. Machines read these.
The Org dialect is what every page on this learn site carries — a small CMS record
written in the grammar, with a :PROPERTIES: drawer the
page reads to build its own chrome. The honest tension: that Org payload is not
JSON, so the runtime's JSON reader chokes on it — one marker id, two dialects, and the
JSON parser only speaks one of them. That's not a bug to hide; it's the shape of the
thing. The id says here is this file's self-description; what dialect lives inside
depends on who's meant to read it.
this page runs on its own SPEC
Here is the demo you can verify from the address bar. The page you're reading did not get its headline, its kicker, its read-time, its table of contents, or its social card from hand-written HTML. It got all of them from its own spec block. Edit the record, and the page follows.
This is the parent lesson's real record — the same kind of block this page carries:
<script type="text/org" id="workbook-spec"> * page what is a workbook :PROPERTIES: :SLUG: workbook :KIND: learn :NN: 01 :TITLE1: try emailing :TITLE2: SOMEONE :TITLE3: your app :CHIP: the unit :HERO: img/workbook-hero.jpg :END: ** og - one .html — interface, code and data </script>
The site's JavaScript reads #workbook-spec, pulls the properties with a
small regex, and rebuilds the chrome from them: :NN: and the kicker text
become the hero kicker; :TITLE1/2/3: become the three-line headline
lockup; the chip, and the read-time (word count over 210), and the table of contents
all derive from the record and the real sections. Separately, the social-card builder
reads the spec and never the HTML around it — the ** og bullets under
the headline become the lines printed on the card you saw when this page was shared.
And the learn site finds its own pages the same way: it greps the directory for files
containing id="workbook-spec" — there is no page list, only the marker.
sequenceDiagram participant S as workbook-spec block participant J as learn.js in the browser participant B as og build.py at build time participant P as the page chrome participant C as the social card S->>J: read the spec, parse properties J->>P: kicker, 3-line H1, chip, read-time, TOC S->>B: read the spec, never the HTML around it B->>C: og bullets become the card lines
So: View Source on this page, scroll to the bottom, find the block. Everything framing this lesson came out of it.
the file carries its own MAKING
Another depth rung. The spec says what the file is; a sibling marker can
carry how it was made. The runtime declares the seam plainly:
<script id="workbook-org"> holds the literate Org source the HTML
tangles from, read by org_source/1 — the same regex shape, a plain Org
payload.
The shipping implementation of that idea lives in the brand-book pipeline, and it's more than a seam — it's a hard contract. A brand book carries its whole source tree as a second inert script: the Org files, JSON-serialised, gzipped, base64-encoded:
<script id="workbook-spec" type="application/json">{"slug":"nike","title":"Nike brand book","createdAt":"…"}</script>
<script id="wb-source-bundle" type="application/x-workbook-source"
data-format="json+gzip+base64" data-version="1"
data-file-count="14" data-bundle-size="48210">H4sIAAAA…</script>
Two markers, both with non-executable types, both ignored by the browser — the spec
to identify the file, the source bundle so the file carries its own making. And the
data-version="1" attribute is not decoration: wb unbundle
hard-rejects any tag that lacks it. The war story is the reason it's there. An earlier
pipeline omitted the version attribute, so every shipped book failed to unbundle
— an audit caught it. The fix wasn't a patch at each call site; it was collapsing all
three render paths onto one function that emits the tag, so the contract can never
drift again. One emitter, one shape, no drift. That's the
grammar's lesson applied to a marker: a contract with one home
doesn't rot.
The minimal version of the whole idea is the kanban mock — paste this into any HTML file and it becomes a recognisable workbook:
<script id="workbook-spec" type="application/json">
{"name":"Kanban Board","version":"0.1.0","kind":"app",
"columns":["Backlog","In Progress","Done"]}
</script>
specs under SIGNATURE
A depth rung, and the place the spec stops being convenience and becomes trust. When
a toolkit ships, it ships as a workbook — there is no separate toolkit package
format, said flatly in the runtime: a toolkit IS a workbook. Its spec is just
{"toolkit":"rev"}, riding inside the HTML. And the install path can demand
a signature.
Because the spec lives inside the file's bytes, signing the file signs the spec.
Tamper with the workbook — change a byte of the spec or anything else — and an install
that requires a signature refuses it with {:error, {:bad_signature, …}}.
The marker that identifies the file is the same bytes the signature protects.
sequenceDiagram
participant A as author
participant W as workbook.html, spec inside
participant I as install, require signature
participant R as toolkit registry
A->>W: sign the HTML, spec rides in the signed bytes
W->>I: ship
I->>I: verify signature over the whole file
alt intact
I->>R: register the toolkit
else tampered
I-->>A: bad signature, rejected
end
The signature itself is a separate inline marker — a provenance manifest in its own script block — and how it's built and checked is the signatures lesson's job, not this one. Here the point is narrower and worth holding: the spec is not the signature, but the spec is inside what gets signed.
where the spec STOPS
Honesty section. The spec is a strong idea with real edges, and the page would be dishonest without them.
There is no schema. A spec is an open map. The keys are per-kind convention,
not a validated contract: a kanban board carries name, version,
kind, columns; a toolkit carries toolkit; a brand
book carries slug and capture metadata; a learn page carries Org
properties. Nothing enforces which keys appear. That's flexibility bought at the price
of a guarantee — readers take what they know and ignore the rest.
The runtime reader has no in-repo callers yet. As the readers section said, the desktop port is the live consumer today; the runtime defines the contract but doesn't yet call its own reader. Defined and proven, not yet wired.
It's a regex, not an HTML parser. The marker is extracted by a regular expression, so a literal closing script tag inside a payload would truncate it early. In practice JSON escaping makes this a non-issue — you won't write a raw closing-script tag inside a JSON string — but it's the real constraint, and pretending the parser is more robust than two regexes would be a lie.
The marker gates recognition, not rendering. The published site serves a
workbook's content whether or not it carries a spec — the marker decides whether
something is recognised (scanned, catalogued, installed), never whether it
renders. The proof is one click away: the downloadable
hello-workbook.html carries no spec block at all. It opens and runs
perfectly. It just wouldn't show up in the desktop grid scan, because nothing told the
scan it was a workbook. The file works; it's only unrecognised — and that wrinkle is
the cleanest possible statement of what the spec is for.
questions people actually ASK
What goes in a spec?
Whatever the kind needs — it's an open map, not a fixed schema. A kanban board
carries name, version, kind,
columns; a brand book carries slug, title,
createdAt and capture metadata; a toolkit carries one key,
toolkit; a learn page carries Org properties. Readers take the keys they
understand and ignore the rest.
Do I need one for the file to open?
No. A workbook with no spec opens and runs fine — hello-workbook.html
ships without one and works. You need the spec to be recognised: scanned
into a library, identified by an installer, found by the site's own page discovery.
Recognition, not rendering.
Why a script tag and not a meta tag or a comment?
Because the payload is a multiline structured document, and a script with a
non-executable type is the one place HTML lets you carry that as data: typed, inert,
and addressable by id. A <meta> holds one short string; a comment
is invisible to the DOM. The script tag is the only fit for a whole JSON object — or
a whole Org record — that the browser ships but never runs.
Is the spec what gets signed?
Not by itself — the signature is a separate provenance block. But the spec lives inside the file's bytes, and the signature covers the whole HTML, so signing the file signs the spec along with it. Tamper with either and a signature-required install rejects the whole thing.
JSON or Org — which do I write?
Both are real; pick by who reads it. JSON (type="application/json") is
the machine contract — what the runtime and desktop readers parse. Org
(type="text/org") is the publishable record — what a page renders its own
chrome from. The id is the same; the dialect follows the reader. Note the JSON reader
won't parse the Org dialect, so don't mix audiences in one file.
If I rename the file, does the spec still work?
Yes — that's the entire point. The marker is the format; the filename is a label.
Rename nike.html to anything you like and every reader still finds the
spec inside it. The one place a name matters is a cheap extension pre-filter on folder
scans, which narrows what gets opened — it never decides identity.
keep GOING
The spec is the file's self-knowledge — start with the anatomy it lives in, then follow it outward into bundles, signatures, and the grammar it's written in.