3 min readTOOLING · AI · SECURITY

Exposing the whole Montoya API: driving Burp Suite from an AI agent

Why a thin MCP wrapper around Burp is useless to an autonomous agent, and what it took to expose functional 100% of the Montoya API, including a reflection bridge into other installed extensions.

agent mcp server EXTENSION montoya api burp core SCANNER INTRUDER COLLAB other extensions DISCOVERED REFLECTION TOOL LIST
The MCP server, plus the reflection bridge into other installed extensions

For most of a year my hunting loop had a hole in the middle of it. The agent could reason about a target, write payloads, and read recon output, but the moment real work needed Burp (repeat a request with a tweaked header, fire an Intruder run, mint a Collaborator payload, decode a JWT), a human had to do it by hand and paste the result back. The agent was blind to the one tool that holds all the state.

The official PortSwigger MCP reference and the third-party clones all wrap the same thin slice of Montoya

  • enough to list proxy history and send a request to Repeater. That is a chatbot integration. An agent driving a hunt needs the whole surface: Scanner, Intruder, Collaborator, the JWT and OIDC editors, the GraphQL tools, session handling, the sitemap, scope, and the project-wide search.

What “functional 100%” actually means

Montoya is large, and a lot of it is callbacks and UI affordances that don’t map cleanly to a stateless tool call. The work wasn’t typing out 152 wrappers, it was deciding, per capability, what an agent needs to do with it and giving it a tool shaped for that, not for the Java method signature.

json
// A tool the agent actually uses, not a 1:1 Montoya mirror
{
  "name": "burp.intruder.run",
  "arguments": {
    "base_request_id": "req_8842",
    "insertion_points": [{ "name": "uid", "type": "path-segment", "index": 3 }],
    "payloads": { "type": "numbers", "from": 1, "to": 5000 },
    "throttle_ms": 40
  }
}
// returns a histogram of status / length / time, not 5000 raw responses

The return shape matters as much as the call. A naive wrapper hands the model 5,000 raw responses and blows the context window on the first Intruder run. The useful version returns a status/length/timing histogram and lets the agent zoom into the outliers, the same way a human reads the Intruder results table.

The part I’m actually proud of: the cross-extension bridge

The non-obvious problem: my Burp already had other extensions loaded, each with capabilities I wanted the agent to reach. Montoya gives an extension no first-class way to call another extension. So the agent could drive Burp’s built-ins but was walled off from everything else in the install.

The bridge walks the loaded-extension registry and uses reflection to invoke the public surfaces of sibling extensions, then re-exposes those as MCP tools at runtime. The agent doesn’t know or care that a capability came from a different extension, it sees one flat tool list. This is the piece I haven’t seen anywhere else, and it turned “drive Burp” into “drive my whole Burp install.”

text
agent ──MCP──▶ burp-mcp-ultimate ──Montoya──▶ Burp core (Scanner/Intruder/…)

                       └──reflection──▶ other loaded extensions ──▶ re-exposed as MCP tools

Reflection is a sharp tool. Sibling extensions change their internals without warning, so every bridged call is defensive: resolve the method fresh, expect it to be gone, and degrade to a typed error rather than crashing the MCP server and taking the agent’s whole session down with it.

What it cost, what it’s worth

It’s Kotlin on JDK 21, MIT-licensed, with a 62-test smoke suite that runs in CI because an extension that crashes Burp on load is worse than no extension. The honest limitation: “functional 100% coverage” is a coverage claim about capability, not a guarantee every Montoya edge case is wired, UI-bound and callback-only corners are represented by the closest agent-usable equivalent, not a literal mirror.

The lesson that generalised past this project: when you build a tool for an agent rather than a human, the interface design is the project. The Java was the easy part. Deciding what 152 things should return, and in what shape, so a model can act on them without drowning, that was the work.