learn / 01·4 — under workbook · specs

the file thatKNOWSwhat it is

A workbook is just an .html file — so how does a desktop app scanning a folder, an installer deciding what to trust, or a website building its own social cards tell it apart from any other web page? Identity moved inside the file. One inert <script> with a known id is the format test, the catalog record, the install identity — and on this site, the record the page renders itself from. The deepest proof is the page you're reading.

the spec11 min read
A small figure standing before a single colossal monolith engraved with one glowing line of self-describing text, the slab dwarfing a wall of unlabelled identical doors behind it — bright, monumental, 1970s sci-fi style

the RENAME test

Take a workbook — say nike.html, a brand book — and rename it to final-v2 (3).html. Now rename it again to untitled with no extension at all. What broke? Nothing. The desktop app still recognises it. The installer still trusts it. The social card still builds. That refusal to break is the whole lesson, and the reason is worth more than the trick.

Every other format on your machine knows itself by its name or its first few bytes. A Word file is .docx; a PDF starts with the literal characters %PDF-; an executable opens with the ELF magic number. The name or the magic bytes are the identity, and rename the file wrong and tools get confused.

A workbook can't play that game. Its entire premise is "just an HTML file" — it opens in any browser, it's served by any web host, it survives email and chat and a USB stick. An extension would be a lie the moment someone dropped it, and magic bytes would mean it stopped being plain HTML. So identity had to go somewhere a rename can't reach: inside the file.

the DEFINITION

spec /spɛk/ noun

1. the inline, machine-readable self-description a workbook carries in a typed <script> tag with the id workbook-spec — its name, kind, and shape, written where the browser won't run it. The marker, not the filename, is the format.

This is the canonical reader, said plainly in the runtime itself — Workbooks.Workbook, the module that defines the format:

# A Workbook is one HTML file with an inline <script id="workbook-spec">
# (the self-description) and the Org source it tangles from.
# Format is identified by the inline marker, not the filename.

It is the third reading of the workbook anatomy: the interface is the HTML you see, the code and data ride underneath — and the spec is the small block that lets the whole thing announce what it is to anything that opens it.

anatomy of the MARKER

The marker is three things in one tag: an id that anything can find, a type that keeps the browser from executing it, and a payload that is the record. Here it is, annotated:

<script id="workbook-spec" type="application/json"> … </script>
idworkbook-spec — the known name every reader greps for; this is the handle, not the filename
typeapplication/json (or text/org) — a non-JS type, so the browser ships the bytes and never runs them: zero runtime cost
payloadthe self-description — name, kind, columns, slug; an open map, different per kind of workbook
a typed script tag is free cargo — structured data the page carries but never executes

The reason it's a <script> and not a <meta> or an HTML comment is exactly that combination: a multiline structured payload that's typed, inert, and addressable by id. A <meta> holds a single short string; a comment is invisible to the DOM. A script with a non-executable type is the one place HTML lets you stash a whole JSON document — or a whole Org record — that the page treats as data and the browser leaves untouched.

So the test any reader runs is simply: does this file carry the marker?

flowchart LR
  f["an .html file
any name, any extension"] q{"carries
script id workbook-spec?"} w["a workbook
readable · catalogable · installable"] p["just a web page
opens and runs — but unrecognised"] f --> q q -- "yes" --> w q -- "no" --> p style f fill:#ffffff,stroke:#121316 style q fill:#fbfaf6,stroke:#121316 style w fill:#a8d4f0,stroke:#121316 style p fill:#f3c5a3,stroke:#121316

24 lines, two LANGUAGES

The format test is small enough to print in full. This is the entire parser — the whole of what makes something a workbook, in the runtime's own words:

def spec(html) when is_binary(html) do
  case Regex.run(~r/<script id="workbook-spec"[^>]*>(.*?)<\/script>/s, html) do
    [_, json] -> Jason.decode(json)
    _ -> {:error, :no_spec}
  end
end

That's it. One regex to find the marker, one Jason.decode to read the payload. On success you get {:ok, %{"toolkit" => "rev"}}; with no marker you get {:error, :no_spec}. The whole module is 24 lines, and the format test is two regexes — the second one finds the embedded source block we'll meet later.

The proof the contract is real and not Elixir-specific: the desktop app ports it byte-for-byte into Rust. Same regex, compiled once and cached — the comment notes the (?s) dotall flag exists because the spec is pretty-printed JSON:

(?s)<script id="workbook-spec"[^>]*>(.*?)</script>

The Rust side calls it read_spec, returns Option<Value>None when the marker is missing or the JSON won't parse — and exposes it to the desktop UI as a command. Two languages, one contract; the format survives the port because it was never more than find the marker, read the payload. Laid side by side:

runtime (Elixir)desktop (Rust / Tauri)
finds the markerWorkbooks.Workbook.spec/1read_spec(path)
exposed to callers asa module functionthe workbook_spec_read command
returns{:ok, map} / {:error, :no_spec}Some(Value) / None
when payload won't parsepasses the decode error throughNone — missing and malformed collapse

One honest note the table can't carry: those Elixir functions currently have no callers inside the runtime — the live consumer today is the desktop port and the test suite. The contract is defined and proven; the runtime just hasn't wired its own reader in yet. We'd rather print that than imply more reach than exists.

how a folder becomes a LIBRARY

This is a depth rung — skip it and the next section still lands. But it's where the marker stops being a definition and starts being a behaviour. The desktop app has to turn a folder of files into a library you can browse, and is this a workbook? is the only question it asks. Here's the walk:

flowchart LR
  dirs["package folders
on disk"] walk["walk for html / htm
cheap extension pre-filter"] scan["par_iter to read_spec
parallel marker scan, rayon"] grid["library entries
path and title"] dirs --> walk --> scan --> grid style dirs fill:#ffffff,stroke:#121316 style walk fill:#f2ddb0,stroke:#121316 style scan fill:#aee5c2,stroke:#121316 style grid fill:#a8d4f0,stroke:#121316

Walk every package folder for .html and .htm files; parallel-scan each one for the spec marker; only the files that pass read_spec become library entries. The title comes from the spec's title field, falling back to the filename when the spec doesn't name one. So in the running app, "is a workbook" is not a guess and not a list — it is literally "carries a parseable workbook-spec."

The extension filter is worth a second look, because it's the one place a filename matters at all. It's not identity — it's economy. Reading a thousand .png files looking for an HTML marker would be wasteful, so the scan pre-filters on extension before it ever opens a file. (The desktop's source-resolver does the same: it rejects non-.html extensions before canonicalising a path.) The extension narrows the haystack; the marker is still the needle. The kanban mock workbook says so in its own comment: the spec marker is what the desktop's grid scan and provenance verifier key on to recognise a workbook.

one marker, two DIALECTS

Same id, two payload languages — both real, both in the wild. The split is honest and worth naming: JSON is the machine contract; text/org is the publishable record.

JSON specOrg-text spec
type attributeapplication/jsontext/org
payloada JSON objectan Org headline + PROPERTIES drawer
producersbrand books, the kanban mock, toolkitsevery live learn page (this one included)
read bythe runtime + desktop JSON readersthe learn site's own JS + OG builder
the JSON reader on itdecodes cleanlyreturns a decode error — wrong dialect

The JSON dialect is the one the runtime and desktop readers speak: a kanban board's spec is {"name":"Kanban Board","version":"0.1.0","kind":"app","columns":[…]}; a toolkit's is {"toolkit":"rev"}; a brand book's carries slug, title, createdAt and a clutch of capture metadata. Machines read these.

The Org dialect is what every page on this learn site carries — a small CMS record written in the grammar, with a :PROPERTIES: drawer the page reads to build its own chrome. The honest tension: that Org payload is not JSON, so the runtime's JSON reader chokes on it — one marker id, two dialects, and the JSON parser only speaks one of them. That's not a bug to hide; it's the shape of the thing. The id says here is this file's self-description; what dialect lives inside depends on who's meant to read it.

this page runs on its own SPEC

Here is the demo you can verify from the address bar. The page you're reading did not get its headline, its kicker, its read-time, its table of contents, or its social card from hand-written HTML. It got all of them from its own spec block. Edit the record, and the page follows.

This is the parent lesson's real record — the same kind of block this page carries:

<script type="text/org" id="workbook-spec">
* page  what is a workbook
  :PROPERTIES:
  :SLUG:   workbook
  :KIND:   learn
  :NN:     01
  :TITLE1: try emailing
  :TITLE2: SOMEONE
  :TITLE3: your app
  :CHIP:   the unit
  :HERO:   img/workbook-hero.jpg
  :END:
** og
- one .html — interface, code and data
</script>

The site's JavaScript reads #workbook-spec, pulls the properties with a small regex, and rebuilds the chrome from them: :NN: and the kicker text become the hero kicker; :TITLE1/2/3: become the three-line headline lockup; the chip, and the read-time (word count over 210), and the table of contents all derive from the record and the real sections. Separately, the social-card builder reads the spec and never the HTML around it — the ** og bullets under the headline become the lines printed on the card you saw when this page was shared. And the learn site finds its own pages the same way: it greps the directory for files containing id="workbook-spec" — there is no page list, only the marker.

sequenceDiagram
  participant S as workbook-spec block
  participant J as learn.js in the browser
  participant B as og build.py at build time
  participant P as the page chrome
  participant C as the social card
  S->>J: read the spec, parse properties
  J->>P: kicker, 3-line H1, chip, read-time, TOC
  S->>B: read the spec, never the HTML around it
  B->>C: og bullets become the card lines
  

So: View Source on this page, scroll to the bottom, find the block. Everything framing this lesson came out of it.

the file carries its own MAKING

Another depth rung. The spec says what the file is; a sibling marker can carry how it was made. The runtime declares the seam plainly: <script id="workbook-org"> holds the literate Org source the HTML tangles from, read by org_source/1 — the same regex shape, a plain Org payload.

The shipping implementation of that idea lives in the brand-book pipeline, and it's more than a seam — it's a hard contract. A brand book carries its whole source tree as a second inert script: the Org files, JSON-serialised, gzipped, base64-encoded:

<script id="workbook-spec" type="application/json">{"slug":"nike","title":"Nike brand book","createdAt":"…"}</script>
<script id="wb-source-bundle" type="application/x-workbook-source"
        data-format="json+gzip+base64" data-version="1"
        data-file-count="14" data-bundle-size="48210">H4sIAAAA…</script>

Two markers, both with non-executable types, both ignored by the browser — the spec to identify the file, the source bundle so the file carries its own making. And the data-version="1" attribute is not decoration: wb unbundle hard-rejects any tag that lacks it. The war story is the reason it's there. An earlier pipeline omitted the version attribute, so every shipped book failed to unbundle — an audit caught it. The fix wasn't a patch at each call site; it was collapsing all three render paths onto one function that emits the tag, so the contract can never drift again. One emitter, one shape, no drift. That's the grammar's lesson applied to a marker: a contract with one home doesn't rot.

The minimal version of the whole idea is the kanban mock — paste this into any HTML file and it becomes a recognisable workbook:

<script id="workbook-spec" type="application/json">
{"name":"Kanban Board","version":"0.1.0","kind":"app",
 "columns":["Backlog","In Progress","Done"]}
</script>

specs under SIGNATURE

A depth rung, and the place the spec stops being convenience and becomes trust. When a toolkit ships, it ships as a workbook — there is no separate toolkit package format, said flatly in the runtime: a toolkit IS a workbook. Its spec is just {"toolkit":"rev"}, riding inside the HTML. And the install path can demand a signature.

Because the spec lives inside the file's bytes, signing the file signs the spec. Tamper with the workbook — change a byte of the spec or anything else — and an install that requires a signature refuses it with {:error, {:bad_signature, …}}. The marker that identifies the file is the same bytes the signature protects.

sequenceDiagram
  participant A as author
  participant W as workbook.html, spec inside
  participant I as install, require signature
  participant R as toolkit registry
  A->>W: sign the HTML, spec rides in the signed bytes
  W->>I: ship
  I->>I: verify signature over the whole file
  alt intact
    I->>R: register the toolkit
  else tampered
    I-->>A: bad signature, rejected
  end
  

The signature itself is a separate inline marker — a provenance manifest in its own script block — and how it's built and checked is the signatures lesson's job, not this one. Here the point is narrower and worth holding: the spec is not the signature, but the spec is inside what gets signed.

where the spec STOPS

Honesty section. The spec is a strong idea with real edges, and the page would be dishonest without them.

There is no schema. A spec is an open map. The keys are per-kind convention, not a validated contract: a kanban board carries name, version, kind, columns; a toolkit carries toolkit; a brand book carries slug and capture metadata; a learn page carries Org properties. Nothing enforces which keys appear. That's flexibility bought at the price of a guarantee — readers take what they know and ignore the rest.

The runtime reader has no in-repo callers yet. As the readers section said, the desktop port is the live consumer today; the runtime defines the contract but doesn't yet call its own reader. Defined and proven, not yet wired.

It's a regex, not an HTML parser. The marker is extracted by a regular expression, so a literal closing script tag inside a payload would truncate it early. In practice JSON escaping makes this a non-issue — you won't write a raw closing-script tag inside a JSON string — but it's the real constraint, and pretending the parser is more robust than two regexes would be a lie.

The marker gates recognition, not rendering. The published site serves a workbook's content whether or not it carries a spec — the marker decides whether something is recognised (scanned, catalogued, installed), never whether it renders. The proof is one click away: the downloadable hello-workbook.html carries no spec block at all. It opens and runs perfectly. It just wouldn't show up in the desktop grid scan, because nothing told the scan it was a workbook. The file works; it's only unrecognised — and that wrinkle is the cleanest possible statement of what the spec is for.

questions people actually ASK

What goes in a spec?

Whatever the kind needs — it's an open map, not a fixed schema. A kanban board carries name, version, kind, columns; a brand book carries slug, title, createdAt and capture metadata; a toolkit carries one key, toolkit; a learn page carries Org properties. Readers take the keys they understand and ignore the rest.

Do I need one for the file to open?

No. A workbook with no spec opens and runs fine — hello-workbook.html ships without one and works. You need the spec to be recognised: scanned into a library, identified by an installer, found by the site's own page discovery. Recognition, not rendering.

Why a script tag and not a meta tag or a comment?

Because the payload is a multiline structured document, and a script with a non-executable type is the one place HTML lets you carry that as data: typed, inert, and addressable by id. A <meta> holds one short string; a comment is invisible to the DOM. The script tag is the only fit for a whole JSON object — or a whole Org record — that the browser ships but never runs.

Is the spec what gets signed?

Not by itself — the signature is a separate provenance block. But the spec lives inside the file's bytes, and the signature covers the whole HTML, so signing the file signs the spec along with it. Tamper with either and a signature-required install rejects the whole thing.

JSON or Org — which do I write?

Both are real; pick by who reads it. JSON (type="application/json") is the machine contract — what the runtime and desktop readers parse. Org (type="text/org") is the publishable record — what a page renders its own chrome from. The id is the same; the dialect follows the reader. Note the JSON reader won't parse the Org dialect, so don't mix audiences in one file.

If I rename the file, does the spec still work?

Yes — that's the entire point. The marker is the format; the filename is a label. Rename nike.html to anything you like and every reader still finds the spec inside it. The one place a name matters is a cheap extension pre-filter on folder scans, which narrows what gets opened — it never decides identity.

keep GOING

The spec is the file's self-knowledge — start with the anatomy it lives in, then follow it outward into bundles, signatures, and the grammar it's written in.