Browsing — the web, from inside a sandbox

the web, from inside a SANDBOX

Sooner or later your software needs the web. An agent has to research a question; an app has to fetch a page; a job has to crawl a docs site. The trouble is that every familiar way to grant that need is a bad one — and the lesson this page lives under spelled out exactly why.

You could give the workload raw network. But a workload here is often a model, and the Nexus security section named the failure mode plainly: anyone whose text reaches an agent is partially steering it. A prompt-injected agent with open egress is an exfiltration machine — it reads a hostile web page and POSTs your secrets somewhere on the same breath. You could shell out to curl — except this runtime deliberately has no native exec; there is no shell to reach. Or you could hard-wire a paid scraping API into every caller — a key in every script, a vendor in every line of code, and a credential sitting one prompt-injection away from a stranger.

Three bad answers, one shared mistake: they all put the web inside the workload. This lesson is about putting it one layer down — in the engine, where the workload can ask for the web without ever holding it.

the DEFINITION

browse /braʊz/ capability

1. the runtime's web capability: three verbs — fetch one page, crawl a set or a site, search the open web — fulfilled by the engine on the workload's behalf, returned as structured data, with a free native browser as the default and the provider for each verb a config value.

One sentence per verb. Fetch takes a URL and gives back a parsed page. Crawl takes a list of URLs or a single seed and gives back many pages — concurrently, breadth-first, same-host. Search takes a query and gives back results. The workload never opens a socket for any of them. Workbooks.Browse is a general runtime primitive, not a feature of any one app — brandnana's harvest is simply one of its callers.

one slot, three VERBS

Browse is a dispatcher. Each verb resolves its own provider, independently, the moment it's called. The rule is small and worth knowing exactly, because it is what makes "swap the provider" a one-line change:

For search, an explicitly configured search_provider wins outright.
Otherwise, the configured provider handles the verb — but only if it declares it. A provider advertises what it can do through a capabilities/0 function; Browse checks cap in mod.capabilities() before routing.
If the configured provider doesn't declare the verb, Browse falls back to the free Native browser — which declares all three.

And the proxy is orthogonal. Whatever provider resolves, Browse merges the configured :proxy into the call's options unless the caller already set one. Choosing a provider and choosing a proxy are two separate knobs that don't interfere.

flowchart TD
  call["a caller asks Browse
fetch · crawl · search"]
  s{"verb is search
and a search_provider
is configured?"}
  d{"configured provider
declares this verb
via capabilities/0?"}
  sp["the search provider
e.g. a SERP API"]
  cp["the configured provider
e.g. Firecrawl"]
  nat["Native — the free
built-in browser"]
  px["merge the configured proxy
unless the caller set one"]
  call --> s
  s -- "yes" --> sp
  s -- "no" --> d
  d -- "yes" --> cp
  d -- "no" --> nat
  sp --> px
  cp --> px
  nat --> px
  px --> out["the verb runs — same page shape back"]
  style call fill:#ffffff,stroke:#121316
  style nat fill:#aee5c2,stroke:#121316,stroke-width:2.5px
  style sp fill:#f2ddb0,stroke:#121316
  style cp fill:#f2ddb0,stroke:#121316
  style px fill:#fbfaf6,stroke:#121316
  style out fill:#ffffff,stroke:#121316

Read the graph top to bottom as the dispatcher's whole decision. A call comes in; if it's a search and you named a search provider, that wins and we're done. Otherwise we ask the one question that matters — does the configured provider declare this verb? If yes, it runs; if no, Native catches it, because Native declares everything. Then, on every path, the proxy gets merged in. The free browser is the floor nothing falls below.

the free BROWSER

The default provider is Native: free, in-engine, no keys, no external service. It's pure BEAM — Erlang's own :ssl and :httpc doing the fetching, a lightweight extractor turning HTML into structure, and a concurrent crawler — and it's built from three small bricks.

brick	what it does	real defaults
Fetch	pure-Erlang `:httpc` + `:ssl` GET — no Rust, no port, no sidecar — following redirects by hand	20s timeout · 5 redirects max
Extract	zero-dependency regex + tag-strip → title, meta/OpenGraph, h1–h3, links, readable text	headings capped at 60 · text at 4000 chars
Crawl	fan-out over a URL list, or breadth-first from one seed, on `Task.async_stream`	concurrency 8 · 25 pages · depth 2

The verdict of that table: every default is conservative on purpose. A fetch that 2xx-es is parsed into a page; a non-2xx comes back as a clean {:http_status, code} error rather than a guess. A crawl dispatches on shape — hand it a list of URLs and it fetches them concurrently; hand it a single seed string and it walks the site breadth-first. No part of this asks for a credential, because no part of it talks to anyone but the origin server.

dressing like a BROWSER

depth rung · skippable — the genuinely surprising bit, for the curious

Here's the part that surprises people. A bare HTTP client looks nothing like a browser, and many sites notice. So Fetch shapes its TLS handshake to resemble one — and it does it from pure Elixir, no native dependency. There are named handshake profiles — :default, :chrome, :safari — and the default profile is :chrome.

The Chrome and Safari profiles force the version posture a real browser ships with — versions: [:"tlsv1.3", :"tlsv1.2"] — reorder the cipher list, and set the elliptic curves a browser offers: eccs: [:x25519, :secp256r1, :secp384r1]. On top of that ride real browser request headers: genuine Chrome 124 and Safari 17.4 user-agent strings, with matching accept and accept-language lines. A spike proved the point that matters — the same endpoint completes under default TLS 1.3 and under forced TLS 1.2 with reordered ciphers. The ClientHello is ours to shape, from the BEAM, with no sidecar.

	:default	:chrome (the default profile)
TLS versions	library default	1.3 then 1.2, forced
cipher order	library default	reordered to match a browser
curves (eccs)	library default	x25519 · secp256r1 · secp384r1
user-agent	browser string	real Chrome 124

The honest limit is stated in the code itself: these profiles approximate a browser's version and cipher posture as it stands today. A byte-exact fingerprint encoder — the extension ordering and GREASE values that produce a matching JA3/JA4 hash — is future work, not a present claim. What exists today is real control over the handshake; what doesn't yet is pixel-for-pixel mimicry.

pages come back as ORG

This is the signature move. Extract doesn't return raw HTML or some bespoke JSON blob you have to learn. A fetched page is a small, stable shape — url, title, description, headings, links, and readable text — and from that shape it can render org. A browse result drops straight into the workbook's context repository as the same grammar everything else in the system speaks.

A fetched URL becomes one org node, tagged :source:point:, with a properties drawer carrying its URL and host, an outline of its headings, its links, and its readable text. Here's the real shape Extract emits:

* Elixir v1.16 — Documentation                                    :source:point:
  :PROPERTIES:
  :URL:    https://hexdocs.pm/elixir
  :HOST:   hexdocs.pm
  :END:
  Elixir is a dynamic, functional language…
** outline
   - Getting started
   - Modules and functions
** links
   - [[https://hexdocs.pm/elixir/Kernel.html][Kernel]]
** text
   Elixir is a dynamic, functional language for building scalable…

A web page, landed as the same org the rest of the system reads and writes — links capped at 40, text at 4000 characters, by design, so a single fetch can't flood the context with an entire site. A crawl concatenates its pages into one org document headed #+TITLE: browse crawl — N pages. The web stops being a foreign format the moment it crosses the engine boundary. (Today the extractor is regex-grade; CSS selectors and table parsing via Floki are a stated upgrade — but the page contract stays put when they land.)

BFS on the BEAM

depth rung · skippable — the crawler's shape, for the curious

Crawl has two entry points. pages/2 takes a known list of URLs and fetches them concurrently. site/2 takes a single seed and walks breadth-first, following only same-host links, defaulting to 25 pages at depth 2. Both fan out over Task.async_stream — no thread pool, no sidecar — eight requests wide, with a per-task timeout of 25 seconds. This is the BEAM's "millions of cheap processes" claim doing ordinary work: concurrency is the language's birthright here, not a library you bolt on.

flowchart LR
  seed["a seed URL"] --> frontier["the frontier
URLs not yet visited"]
  frontier --> fan["fan out — 8 wide
Task.async_stream"]
  fan --> got["pages that came back
dead URLs dropped silently"]
  got --> filt{"new links,
same host only?"}
  filt -- "yes, and under
max_pages / depth" --> frontier
  filt -- "limit reached" --> done["the crawl returns
everything that came back"]
  style seed fill:#ffffff,stroke:#121316
  style fan fill:#aee5c2,stroke:#121316,stroke-width:2.5px
  style done fill:#f2ddb0,stroke:#121316
  style filt fill:#fbfaf6,stroke:#121316

Follow the loop as a story. A seed becomes the frontier; the frontier fans out eight at a time; whatever comes back has its same-host links harvested and fed into the next wave — and whatever doesn't come back is simply dropped. A dead URL doesn't sink the crawl; the result is everything that returned, not a demand that everything return. The loop exits when it hits the page cap or the depth limit, and hands back the pages it gathered.

search and the datacenter WALL

Search is keyless by default — and it carries the truest war story in the codebase. Native search scrapes the SERP pages of real engines — duckduckgo, brave, bing — using the DuckDuckGo HTML endpoint first because it's the most scrape-stable, and taking the first engine that returns anything. It even inherits the TLS-fingerprinted, browser-headed GET from Fetch, so the scrape looks like a browser.

And it works beautifully — on a laptop. The problem, written right into the code, is that these keyless scrape engines return empty from datacenter IP addresses, because non-residential traffic gets served captchas instead of results. On a cloud box, web_search silently finds nothing — and this stranded the project's own landing site even with a strong model behind it. So when $DATAFORSEO_AUTH is set (a base64 login:password) and no engine is forced, Browse goes through DataForSEO's real SERP API first — a US-located Google organic query, depth matched to your limit — and falls through to scraping only if that's absent or errors.

flowchart TD
  q["a search query"]
  k{"$DATAFORSEO_AUTH set,
no engine forced?"}
  dfs["DataForSEO SERP API
real Google organic results"]
  ddg["scrape DuckDuckGo HTML
most scrape-stable"]
  brave["scrape Brave"]
  bing["scrape Bing"]
  res["results — first non-empty wins
title · url · snippet"]
  q --> k
  k -- "yes — the cloud answer" --> dfs
  k -- "no — the laptop answer" --> ddg
  dfs -- "absent or errors" --> ddg
  ddg -- "empty" --> brave
  brave -- "empty" --> bing
  dfs --> res
  ddg --> res
  brave --> res
  bing --> res
  style q fill:#ffffff,stroke:#121316
  style dfs fill:#aee5c2,stroke:#121316,stroke-width:2.5px
  style ddg fill:#f2ddb0,stroke:#121316
  style res fill:#ffffff,stroke:#121316

The fork in that graph is the whole story. On a laptop, the residential IP sails through the scrape engines and you never think about it. On a fly box, the datacenter IP hits the captcha wall and the scrape cascade returns nothing — so the API lane exists precisely to catch the case the free lane can't. One environment variable moves the laptop answer to the cloud answer. (The scraped DuckDuckGo hrefs are unwrapped from their redirect parameter; Brave and Bing fall back to a generic anchor scrape that rejects junk hosts like the engines' own domains.)

swapping the engine of the ENGINE

Native can't run JavaScript or beat a bot-wall — and that's exactly why the provider slot exists. The shipped, proven example is Firecrawl: it renders JS and defeats bot-walls, and it slots in with one config line.

config :workbooks, :browse, provider: Workbooks.Browse.Firecrawl

That's the entire change. Every fetch, crawl, and search now routes to Firecrawl instead of Native — and no caller changes, including brandnana's harvest. The reason is the contract: Firecrawl POSTs to its /v1/scrape endpoint for rendered HTML, then runs it through the very same Extract.parse as Native, so it returns the identical page shape. Providers converge on one output type; the caller cannot tell which provider ran.

sequenceDiagram
  participant C as a caller
  participant B as Browse
  participant F as Firecrawl
  participant E as Extract
  C->>B: fetch(url)
  B->>F: POST /v1/scrape — formats: [html]
  F-->>B: rendered HTML (JS executed, bot-wall passed)
  B->>E: Extract.parse(html, url)
  E-->>C: the same page shape Native returns
  Note over C,E: the caller can't tell which provider ran

Walk that exchange through. The caller asks Browse to fetch; Browse asks Firecrawl, which renders the page with a real browser engine and hands back HTML that JavaScript has already populated; Browse runs that HTML through the same extractor; the caller gets back the same page it always gets. The provider is the engine of the engine, and you can change it without anyone downstream noticing.

Two more facts make the slot complete. First, every provider method — fetch, crawl, search — is an optional callback, so a partial provider can implement only what it can do and let capabilities/0 declare exactly that; the dispatcher's fallback covers the rest. Second, the proxy is its own knob: configure :proxy as a {host, port} tuple or a zero-arity function that returns one, and a rotating-proxy service turns the vanilla browser into a fuller scraper. (A real one shipped as a fly secret — FIRECRAWL_API_KEY — proving the slot in production. An Exa-style search provider is the kind of thing you'd plug in; Firecrawl is the one external provider shipped today.)

who gets to BROWSE

Browse is a capability with three consumer surfaces, and a policy that decides who reaches it. The surfaces all return the same structured page; the policy decides whether the call links at all.

surface	caller	granted by	what comes back
agent tools	an agent calling `web_search` or `fetch`	the agent's profile	bulleted title / url / snippet text
`browse-fetch`	a WASM component through the Dock	`network` · `posix`	a JSON page string — the component never opens a socket
`POST /api/browse`	an authed caller on the control plane	the auth plug	org or JSON, per the request

The verdict of that table: same capability, three doors, one shape coming back. The web_search agent tool is the research capability the agents regained when the native run/curl hatch was removed — it calls Browse's search and returns a clean bulleted list. The Dock surface is the strict one: capability "browse" becomes a typed import browse-fetch, and the host owns network egress so the component never opens a socket — no wasi:http or sockets are granted. The HTTP surface is POST /api/browse on the authed control plane, where "as": "org" returns crawl org and "as": "json" returns a page summary.

And the gate is by construction, not by checkbox. Browse is granted by the network and posix policy profiles — and denied by minimal and compute. A component that hard-imports browse-fetch under minimal doesn't get a polite runtime error; it fails to link. The denial is in the wiring, proven end-to-end by a real test that runs a prebuilt probe component against a local server and watches the unprivileged build refuse to come together.

what the free browser ISN'T

Honesty section — and every limit here has its escape hatch built into the same design, which is the whole point of the slot.

No JavaScript, no bot-wall victories. Native fetches HTML as served; a single-page app or a wall returns thin or nothing. The hatch: one config line swaps in Firecrawl, which renders JS and defeats walls.
The TLS fingerprint approximates. The handshake posture is real and controllable, but byte-exact JA3/JA4 is future work. The hatch: the proxy knob and an external provider for the cases that need exactness today.
Extract is regex-grade. Title, meta, headings, links, text — no CSS selectors or table parsing yet. The hatch: Floki is the stated upgrade, and the page contract won't move when it lands.
Keyless search dies on datacenter IPs. The captcha wall is real and silent in the cloud. The hatch: one env var — $DATAFORSEO_AUTH — or a configured search provider.
Proxied fetches serialize. :httpc's proxy setting is process-global, so proxied requests go one at a time by design. The hatch: a per-proxy connection pool is a stated follow-up; unproxied fetches stay concurrent.
One proven external provider. Firecrawl ships; an Exa module does not exist yet. The hatch: the slot is the same regardless — anything that returns the page shape drops in.

questions people actually ASK

Is it really free out of the box?

Yes. The native browser is pure BEAM — Erlang's own TLS and HTTP, a regex extractor, a concurrent crawler, and keyless SERP scraping. No keys, no external service, no meter. Your agent can fetch, crawl, and search the web the minute the engine is up, and what comes back is structured — a parsed page or an org node, not a wall of HTML.

Why is search empty on my cloud deploy?

Because keyless SERP scraping returns nothing from datacenter IPs — the engines serve captchas to non-residential traffic, silently. This is the war story written into the code. The fix is one secret: fly secrets set DATAFORSEO_AUTH=$(printf 'login:password' | base64), and search now goes DataForSEO-first with the scrape cascade as fallback. No workbook changes.

Do my agents hold the Firecrawl key?

No — and that's the same pattern as secrets generally. FIRECRAWL_API_KEY and DATAFORSEO_AUTH live in the engine's environment and never enter the sandbox. The workload asks Browse to fetch; the engine holds the credential and makes the call. A prompt-injected agent can't leak a key it was never handed.

Can a workbook just open a socket instead?

Only if it's granted the network capability — and even then, browse is the brokered lane that exists so it usually doesn't have to. Through the Dock, the component imports browse-fetch and the host owns egress; the component never sees a socket. Raw network is a specific, separate grant, denied by default, and reserved for when brokered browsing genuinely isn't enough.

Does it respect robots.txt?

Honestly: there's no robots.txt handling in the code today. Native fetch and crawl don't read or honor a robots file — crawl limits itself to same-host links under a page and depth cap, but that's politeness by accident, not by policy. If you need robots compliance, that's a gap to close, and we'd rather say so than imply a check that isn't there.

Does the caller know which provider ran?

No, and that's the design. Native, Firecrawl, a future search provider — they all converge on one page shape, because external providers run their output through the same Extract.parse the native browser uses. You change the provider in one config line; every caller, including brandnana's harvest, keeps working untouched.

keep GOING

Browsing is the capability and grant model from the parent lesson, applied to the web. The neighbors below are the rest of that doctrine.

The Nexusthe parent — capabilities, grants, the exfiltration framing

→

Agentswho calls web_search and fetch

→ ⎔

The Dockwhere browse-fetch is a typed import

→

Org, the grammarthe format a page comes home as

→