Hypotheses over signatures: what I learned building an autonomous pentest agent

The hypothesis loop, gated on evidence before anything counts as a finding

A scanner fires every signature it has and prints a CSV. That is the opposite of how a researcher works. A researcher looks at a target, asks “what did the developer assume I can’t do,” forms a small number of high-value guesses, and tests the most promising one first. Project Triage is my attempt to put that loop inside an agent.

The loop is the architecture

The core is four stages: build intelligence, generate hypotheses, hunt, emit output. Recon and fingerprinting feed a hypothesis generator that produces candidate attacks ranked by dollar value, not by alphabetical tool order. The hunt loop is the classic think/act/observe/learn cycle, and the output is a validated finding plus a HackerOne-shaped report.

plaintext

 Intelligence       Hypotheses        Hunt Loop          Output
 recon, JS,    -->  19 brain     -->  think/act/    -->  validated
 fingerprint,       modules,          observe/            findings,
 scope              rank by $         learn               H1 report

What makes a single step honest is everything wrapped around the action: read the pentest tree state, select from a constrained action set, check it isn’t a repeat, check it’s in scope, throttle, execute, classify the response, summarize the output, then generate follow-ups. The model is never trusted to free-form its way through that. Each gate is code.

The 19 brain modules are where the reasoning lives

The agent doesn’t just run tools, it reasons about the target before it touches it. A few of the modules that earned their place:

The Assumption Engine asks the one question that finds real bugs: “what did the developer assume I can’t do.” Most access-control flaws live in that gap.
The Confusion Engine looks for semantic disagreements between what the proxy sees and what the backend does, the seam where parser-differential bugs hide.
The Chain Engine combines primitives, for example turning an SSRF plus an instance metadata endpoint into a cloud-credential takeover, instead of reporting each half as a low.
The LATS Explorer does tree search with verbal reflections on failure, so a dead branch teaches the next attempt instead of being silently abandoned.
A Scale Model judges whether the target is a startup or an enterprise, because the same endpoint deserves different hypotheses depending on who built it.

What actually worked: treating the agent as a system that fails

The thing I underestimated going in was how much of this is failure engineering, not intelligence engineering. A capable model with no scaffolding gets lost. It repeats the same request, talks itself into a hallucinated finding, or drowns in its own tool output. Four pieces of robustness, each grounded in published agent research, did the heavy lifting.

A Response Classifier sits in front of all of it, detecting WAF vendors before the agent reasons about a response, paired with an adaptive throttle that backs off on blocks and rate limits. An agent that doesn’t notice it’s been soft blocked will cheerfully build a theory on garbage.

What was genuinely hard, and the honest scope

Two things stayed hard. The first is hypothesis ranking: “rank by bounty value” is easy to say and a real modeling problem, because value depends on the target, the program, and what the chain could become, not on the bug class in isolation. The second is the validation boundary. An autonomous agent’s worst failure mode is a confident, well-written, false finding, so a quality gate has to sit between “the agent thinks it found something” and “this becomes a report.”

The honest framing: it runs on cloud Claude for the strong-reasoning path or a local model for privacy and cost, and the local path is only as good as the model you give it. The 71K lines are mostly the unglamorous 80%, the state management, the classifiers, the summarizers, the throttle. The lesson that generalized: with a frontier model, raw capability is not the bottleneck. The scaffolding that keeps a long-horizon agent grounded is the product.