the web, from inside a SANDBOX
Sooner or later your software needs the web. An agent has to research a question; an app has to fetch a page; a job has to crawl a docs site. The trouble is that every familiar way to grant that need is a bad one — and the lesson this page lives under spelled out exactly why.
You could give the workload raw network. But a workload here is often a
model, and the Nexus security section named the
failure mode plainly: anyone whose text reaches an agent is partially steering
it. A prompt-injected agent with open egress is an exfiltration machine — it
reads a hostile web page and POSTs your secrets somewhere on the same breath.
You could shell out to curl — except this runtime deliberately has
no native exec; there is no shell to reach. Or you could hard-wire a paid
scraping API into every caller — a key in every script, a vendor in every line
of code, and a credential sitting one prompt-injection away from a stranger.
Three bad answers, one shared mistake: they all put the web inside the workload. This lesson is about putting it one layer down — in the engine, where the workload can ask for the web without ever holding it.
the DEFINITION
1. the runtime's web capability: three verbs — fetch one page, crawl a set or a site, search the open web — fulfilled by the engine on the workload's behalf, returned as structured data, with a free native browser as the default and the provider for each verb a config value.
One sentence per verb. Fetch takes a URL and gives back a parsed page.
Crawl takes a list of URLs or a single seed and gives back many pages —
concurrently, breadth-first, same-host. Search takes a query and gives
back results. The workload never opens a socket for any of them. Workbooks.Browse
is a general runtime primitive, not a feature of any one app — brandnana's
harvest is simply one of its callers.
one slot, three VERBS
Browse is a dispatcher. Each verb resolves its own provider, independently, the moment it's called. The rule is small and worth knowing exactly, because it is what makes "swap the provider" a one-line change:
- For search, an explicitly configured
search_providerwins outright. - Otherwise, the configured
providerhandles the verb — but only if it declares it. A provider advertises what it can do through acapabilities/0function; Browse checkscap in mod.capabilities()before routing. - If the configured provider doesn't declare the verb, Browse falls back to the free Native browser — which declares all three.
And the proxy is orthogonal. Whatever provider resolves, Browse merges the
configured :proxy into the call's options unless the caller already
set one. Choosing a provider and choosing a proxy are two separate knobs that
don't interfere.
flowchart TD call["a caller asks Browse
fetch · crawl · search"] s{"verb is search
and a search_provider
is configured?"} d{"configured provider
declares this verb
via capabilities/0?"} sp["the search provider
e.g. a SERP API"] cp["the configured provider
e.g. Firecrawl"] nat["Native — the free
built-in browser"] px["merge the configured proxy
unless the caller set one"] call --> s s -- "yes" --> sp s -- "no" --> d d -- "yes" --> cp d -- "no" --> nat sp --> px cp --> px nat --> px px --> out["the verb runs — same page shape back"] style call fill:#ffffff,stroke:#121316 style nat fill:#aee5c2,stroke:#121316,stroke-width:2.5px style sp fill:#f2ddb0,stroke:#121316 style cp fill:#f2ddb0,stroke:#121316 style px fill:#fbfaf6,stroke:#121316 style out fill:#ffffff,stroke:#121316
Read the graph top to bottom as the dispatcher's whole decision. A call comes in; if it's a search and you named a search provider, that wins and we're done. Otherwise we ask the one question that matters — does the configured provider declare this verb? If yes, it runs; if no, Native catches it, because Native declares everything. Then, on every path, the proxy gets merged in. The free browser is the floor nothing falls below.
the free BROWSER
The default provider is Native: free, in-engine, no keys, no external
service. It's pure BEAM — Erlang's own :ssl and :httpc
doing the fetching, a lightweight extractor turning HTML into structure, and a
concurrent crawler — and it's built from three small bricks.
| brick | what it does | real defaults |
|---|---|---|
| Fetch | pure-Erlang :httpc + :ssl GET — no Rust, no port, no sidecar — following redirects by hand | 20s timeout · 5 redirects max |
| Extract | zero-dependency regex + tag-strip → title, meta/OpenGraph, h1–h3, links, readable text | headings capped at 60 · text at 4000 chars |
| Crawl | fan-out over a URL list, or breadth-first from one seed, on Task.async_stream | concurrency 8 · 25 pages · depth 2 |
The verdict of that table: every default is conservative on purpose. A fetch
that 2xx-es is parsed into a page; a non-2xx comes back as a clean
{:http_status, code} error rather than a guess. A crawl dispatches
on shape — hand it a list of URLs and it fetches them concurrently; hand it a
single seed string and it walks the site breadth-first. No part of this asks for
a credential, because no part of it talks to anyone but the origin server.
dressing like a BROWSER
depth rung · skippable — the genuinely surprising bit, for the curious
Here's the part that surprises people. A bare HTTP client looks nothing like a
browser, and many sites notice. So Fetch shapes its TLS handshake to resemble
one — and it does it from pure Elixir, no native dependency. There are
named handshake profiles — :default, :chrome,
:safari — and the default profile is :chrome.
The Chrome and Safari profiles force the version posture a real browser ships
with — versions: [:"tlsv1.3", :"tlsv1.2"] — reorder the cipher list,
and set the elliptic curves a browser offers: eccs: [:x25519, :secp256r1,
:secp384r1]. On top of that ride real browser request headers: genuine
Chrome 124 and Safari 17.4 user-agent strings, with matching accept and
accept-language lines. A spike proved the point that matters — the same endpoint
completes under default TLS 1.3 and under forced TLS 1.2 with reordered
ciphers. The ClientHello is ours to shape, from the BEAM, with no sidecar.
| :default | :chrome (the default profile) | |
|---|---|---|
| TLS versions | library default | 1.3 then 1.2, forced |
| cipher order | library default | reordered to match a browser |
| curves (eccs) | library default | x25519 · secp256r1 · secp384r1 |
| user-agent | browser string | real Chrome 124 |
The honest limit is stated in the code itself: these profiles approximate a browser's version and cipher posture as it stands today. A byte-exact fingerprint encoder — the extension ordering and GREASE values that produce a matching JA3/JA4 hash — is future work, not a present claim. What exists today is real control over the handshake; what doesn't yet is pixel-for-pixel mimicry.
pages come back as ORG
This is the signature move. Extract doesn't return raw HTML or some bespoke
JSON blob you have to learn. A fetched page is a small, stable shape —
url, title, description,
headings, links, and readable text —
and from that shape it can render org. A browse result drops
straight into the workbook's context repository as the same grammar everything
else in the system speaks.
A fetched URL becomes one org node, tagged :source:point:, with a
properties drawer carrying its URL and host, an outline of its headings, its
links, and its readable text. Here's the real shape Extract emits:
* Elixir v1.16 — Documentation :source:point: :PROPERTIES: :URL: https://hexdocs.pm/elixir :HOST: hexdocs.pm :END: Elixir is a dynamic, functional language… ** outline - Getting started - Modules and functions ** links - [[https://hexdocs.pm/elixir/Kernel.html][Kernel]] ** text Elixir is a dynamic, functional language for building scalable…
A web page, landed as the same org the rest of the system reads and writes —
links capped at 40, text at 4000 characters, by design, so a single fetch can't
flood the context with an entire site. A crawl concatenates its pages into one
org document headed #+TITLE: browse crawl — N pages. The web stops
being a foreign format the moment it crosses the engine boundary. (Today the
extractor is regex-grade; CSS selectors and table parsing via Floki are a stated
upgrade — but the page contract stays put when they land.)
BFS on the BEAM
depth rung · skippable — the crawler's shape, for the curious
Crawl has two entry points. pages/2 takes a known list of URLs and
fetches them concurrently. site/2 takes a single seed and walks
breadth-first, following only same-host links, defaulting to 25 pages at depth 2.
Both fan out over Task.async_stream — no thread pool, no sidecar —
eight requests wide, with a per-task timeout of 25 seconds. This is the BEAM's
"millions of cheap processes" claim doing ordinary work: concurrency is the
language's birthright here, not a library you bolt on.
flowchart LR seed["a seed URL"] --> frontier["the frontier
URLs not yet visited"] frontier --> fan["fan out — 8 wide
Task.async_stream"] fan --> got["pages that came back
dead URLs dropped silently"] got --> filt{"new links,
same host only?"} filt -- "yes, and under
max_pages / depth" --> frontier filt -- "limit reached" --> done["the crawl returns
everything that came back"] style seed fill:#ffffff,stroke:#121316 style fan fill:#aee5c2,stroke:#121316,stroke-width:2.5px style done fill:#f2ddb0,stroke:#121316 style filt fill:#fbfaf6,stroke:#121316
Follow the loop as a story. A seed becomes the frontier; the frontier fans out eight at a time; whatever comes back has its same-host links harvested and fed into the next wave — and whatever doesn't come back is simply dropped. A dead URL doesn't sink the crawl; the result is everything that returned, not a demand that everything return. The loop exits when it hits the page cap or the depth limit, and hands back the pages it gathered.
search and the datacenter WALL
Search is keyless by default — and it carries the truest war story in the
codebase. Native search scrapes the SERP pages of real engines —
duckduckgo, brave, bing — using the
DuckDuckGo HTML endpoint first because it's the most scrape-stable, and taking
the first engine that returns anything. It even inherits the TLS-fingerprinted,
browser-headed GET from Fetch, so the scrape looks like a browser.
And it works beautifully — on a laptop. The problem, written right into the
code, is that these keyless scrape engines return empty from datacenter
IP addresses, because non-residential traffic gets served captchas instead of
results. On a cloud box, web_search silently finds nothing — and
this stranded the project's own landing site even with a strong model behind it.
So when $DATAFORSEO_AUTH is set (a base64
login:password) and no engine is forced, Browse goes through
DataForSEO's real SERP API first — a US-located Google organic query, depth
matched to your limit — and falls through to scraping only if that's absent or
errors.
flowchart TD
q["a search query"]
k{"$DATAFORSEO_AUTH set,
no engine forced?"}
dfs["DataForSEO SERP API
real Google organic results"]
ddg["scrape DuckDuckGo HTML
most scrape-stable"]
brave["scrape Brave"]
bing["scrape Bing"]
res["results — first non-empty wins
title · url · snippet"]
q --> k
k -- "yes — the cloud answer" --> dfs
k -- "no — the laptop answer" --> ddg
dfs -- "absent or errors" --> ddg
ddg -- "empty" --> brave
brave -- "empty" --> bing
dfs --> res
ddg --> res
brave --> res
bing --> res
style q fill:#ffffff,stroke:#121316
style dfs fill:#aee5c2,stroke:#121316,stroke-width:2.5px
style ddg fill:#f2ddb0,stroke:#121316
style res fill:#ffffff,stroke:#121316
The fork in that graph is the whole story. On a laptop, the residential IP sails through the scrape engines and you never think about it. On a fly box, the datacenter IP hits the captcha wall and the scrape cascade returns nothing — so the API lane exists precisely to catch the case the free lane can't. One environment variable moves the laptop answer to the cloud answer. (The scraped DuckDuckGo hrefs are unwrapped from their redirect parameter; Brave and Bing fall back to a generic anchor scrape that rejects junk hosts like the engines' own domains.)
swapping the engine of the ENGINE
Native can't run JavaScript or beat a bot-wall — and that's exactly why the provider slot exists. The shipped, proven example is Firecrawl: it renders JS and defeats bot-walls, and it slots in with one config line.
config :workbooks, :browse, provider: Workbooks.Browse.Firecrawl
That's the entire change. Every fetch, crawl, and search now routes to
Firecrawl instead of Native — and no caller changes, including brandnana's
harvest. The reason is the contract: Firecrawl POSTs to its
/v1/scrape endpoint for rendered HTML, then runs it through the very
same Extract.parse as Native, so it returns the identical page
shape. Providers converge on one output type; the caller cannot tell which
provider ran.
sequenceDiagram participant C as a caller participant B as Browse participant F as Firecrawl participant E as Extract C->>B: fetch(url) B->>F: POST /v1/scrape — formats: [html] F-->>B: rendered HTML (JS executed, bot-wall passed) B->>E: Extract.parse(html, url) E-->>C: the same page shape Native returns Note over C,E: the caller can't tell which provider ran
Walk that exchange through. The caller asks Browse to fetch; Browse asks Firecrawl, which renders the page with a real browser engine and hands back HTML that JavaScript has already populated; Browse runs that HTML through the same extractor; the caller gets back the same page it always gets. The provider is the engine of the engine, and you can change it without anyone downstream noticing.
Two more facts make the slot complete. First, every provider method —
fetch, crawl, search — is an
optional callback, so a partial provider can implement only what it can
do and let capabilities/0 declare exactly that; the dispatcher's
fallback covers the rest. Second, the proxy is its own knob: configure
:proxy as a {host, port} tuple or a zero-arity function
that returns one, and a rotating-proxy service turns the vanilla browser into a
fuller scraper. (A real one shipped as a fly secret —
FIRECRAWL_API_KEY — proving the slot in production. An Exa-style
search provider is the kind of thing you'd plug in; Firecrawl is the one external
provider shipped today.)
who gets to BROWSE
Browse is a capability with three consumer surfaces, and a policy that decides who reaches it. The surfaces all return the same structured page; the policy decides whether the call links at all.
| surface | caller | granted by | what comes back |
|---|---|---|---|
| agent tools | an agent calling web_search or fetch | the agent's profile | bulleted title / url / snippet text |
browse-fetch | a WASM component through the Dock | network · posix | a JSON page string — the component never opens a socket |
POST /api/browse | an authed caller on the control plane | the auth plug | org or JSON, per the request |
The verdict of that table: same capability, three doors, one shape coming back.
The web_search agent tool is the research capability the agents
regained when the native run/curl hatch was removed — it calls Browse's search and
returns a clean bulleted list. The Dock surface is the strict one: capability
"browse" becomes a typed import browse-fetch, and the
host owns network egress so the component never opens a socket — no
wasi:http or sockets are granted. The HTTP surface is
POST /api/browse on the authed control plane, where
"as": "org" returns crawl org and "as": "json" returns
a page summary.
And the gate is by construction, not by checkbox. Browse is granted by the
network and posix policy profiles — and denied by
minimal and compute. A component that hard-imports
browse-fetch under minimal doesn't get a polite runtime
error; it fails to link. The denial is in the wiring, proven end-to-end by
a real test that runs a prebuilt probe component against a local server and
watches the unprivileged build refuse to come together.
what the free browser ISN'T
Honesty section — and every limit here has its escape hatch built into the same design, which is the whole point of the slot.
- No JavaScript, no bot-wall victories. Native fetches HTML as served; a single-page app or a wall returns thin or nothing. The hatch: one config line swaps in Firecrawl, which renders JS and defeats walls.
- The TLS fingerprint approximates. The handshake posture is real and controllable, but byte-exact JA3/JA4 is future work. The hatch: the proxy knob and an external provider for the cases that need exactness today.
- Extract is regex-grade. Title, meta, headings, links, text — no CSS selectors or table parsing yet. The hatch: Floki is the stated upgrade, and the page contract won't move when it lands.
- Keyless search dies on datacenter IPs. The captcha wall is real and
silent in the cloud. The hatch: one env var —
$DATAFORSEO_AUTH— or a configured search provider. - Proxied fetches serialize.
:httpc's proxy setting is process-global, so proxied requests go one at a time by design. The hatch: a per-proxy connection pool is a stated follow-up; unproxied fetches stay concurrent. - One proven external provider. Firecrawl ships; an Exa module does not exist yet. The hatch: the slot is the same regardless — anything that returns the page shape drops in.
questions people actually ASK
Is it really free out of the box?
Yes. The native browser is pure BEAM — Erlang's own TLS and HTTP, a regex extractor, a concurrent crawler, and keyless SERP scraping. No keys, no external service, no meter. Your agent can fetch, crawl, and search the web the minute the engine is up, and what comes back is structured — a parsed page or an org node, not a wall of HTML.
Why is search empty on my cloud deploy?
Because keyless SERP scraping returns nothing from datacenter IPs — the
engines serve captchas to non-residential traffic, silently. This is the war
story written into the code. The fix is one secret:
fly secrets set DATAFORSEO_AUTH=$(printf 'login:password' | base64),
and search now goes DataForSEO-first with the scrape cascade as fallback. No
workbook changes.
Do my agents hold the Firecrawl key?
No — and that's the same pattern as secrets generally. FIRECRAWL_API_KEY
and DATAFORSEO_AUTH live in the engine's environment and never
enter the sandbox. The workload asks Browse to fetch; the engine holds the
credential and makes the call. A prompt-injected agent can't leak a key it was
never handed.
Can a workbook just open a socket instead?
Only if it's granted the network capability — and even then, browse is the
brokered lane that exists so it usually doesn't have to. Through the Dock, the
component imports browse-fetch and the host owns egress; the
component never sees a socket. Raw network is a specific, separate grant, denied
by default, and reserved for when brokered browsing genuinely isn't enough.
Does it respect robots.txt?
Honestly: there's no robots.txt handling in the code today. Native fetch and crawl don't read or honor a robots file — crawl limits itself to same-host links under a page and depth cap, but that's politeness by accident, not by policy. If you need robots compliance, that's a gap to close, and we'd rather say so than imply a check that isn't there.
Does the caller know which provider ran?
No, and that's the design. Native, Firecrawl, a future search provider — they
all converge on one page shape, because external providers run their output
through the same Extract.parse the native browser uses. You change
the provider in one config line; every caller, including brandnana's harvest,
keeps working untouched.
keep GOING
Browsing is the capability and grant model from the parent lesson, applied to the web. The neighbors below are the rest of that doctrine.