AgentBrowser
A browser runtime built for AI agents instead of retrofitted from one: an API that speaks goals, a real visible cursor, per-site memory, and a replayable audit trail.
Problem
Every browser-automation tool was built for humans first and bolted onto agents later. They speak DOM operations, dump thousands of tokens of HTML, re-learn each site every run, and leave no audit trail. An agent does not read a screen the way a person does, so the whole interface was wrong for the job.
Architecture
The API takes goals (act({ goal: 'submit the payment form' })), and the
runtime resolves them with DOM heuristics first and a vision model only when the
DOM is ambiguous, returning a compact structured summary of what is actionable
instead of an 8,000-token HTML dump. Input goes through CDP raw mouse and
keyboard events along a Bezier trajectory, so the cursor is real and visible.
Every action is verified with a DOM diff and recorded to a JSONL trace you can
replay. A per-site action memory persists how a goal was satisfied and skips the
model entirely on repeat visits. It runs over real Chromium and is
provider-agnostic across roughly seventeen LLM backends behind one environment
variable, exposed as a library, an MCP server, and HTTP/WS/SSE.
Outcome
Repeat runs collapse to a fraction of the tokens and wall-clock once a site’s path is remembered, and the trace makes “the agent did something weird” reproducible. Full writeup: Goals, not selectors.