Anatomy of an autonomous bug bounty pipeline

Five tools as job servers behind one MCP entry point

The naive way to give an agent a scanner is to shell out and wait. That breaks two ways: a scan that takes four minutes stalls the whole agent loop, and the raw output that comes back blows the context window. BountyHound’s architecture is a direct response to both problems, and the part I want to argue for is the validation boundary that sits at the end.

One pattern, repeated five times

Every tool is the same shape, which is the whole point. Nuclei, SQLMap, Nmap, Ffuf, and Amass each run as an independent FastAPI service on its own port, sharing a base library (bh-core) for the job model, state manager, and persistence. The agent never talks to them directly. It talks to one unified MCP server that proxies to all five.

plaintext

Claude Code
    |
MCP unified server
    |-- Nuclei  (8188)  template-based vuln scanning
    |-- SQLMap  (8189)  SQL injection testing
    |-- Nmap    (8190)  network recon
    |-- Ffuf    (8191)  web fuzzing
    +-- Amass   (8192)  subdomain enumeration

The contract is identical across all of them: POST /api/{action} starts a job and returns a job_id immediately, GET /api/{action}/{job_id} polls status and results. A scan never blocks the caller. The job runs as a background subprocess and moves through one state machine, running to completed, cancelled, or error. That uniformity is what lets the agent reason about five different tools as if they were one: start, poll, read, repeat.

The pipeline is a chain of cheap-then-expensive

Autonomous hunting is recon, then hypothesis, then test, then validate, then evidence. The async job model maps cleanly onto that because each stage feeds the next without a human in the loop:

The cross-tool workflow is the architecture working as intended: Nmap discovers a service on 8080, Ffuf enumerates directories under it, Nuclei scans the endpoints Ffuf found. Nobody pasted anything between steps.

The discipline: the agent does not own “this is a finding”

Here is the part that actually keeps an autonomous pipeline honest, and it is not a tool, it is a boundary. Every tool returns the same job object, and a finding only exists inside a completed job with results the agent can point at. Three properties enforce that.

First, structured status over prose. A job is running, completed, cancelled, or error, and an error is a typed field ("Scan timed out after 300s"), not a sentence buried in stdout. An agent recovers from a typed error state and loops forever on ambiguous text.

Second, the agent must poll for a terminal state before it can claim anything. A hypothesis is not confirmed because a scan was started; it is confirmed when a completed job carries the result. That single rule, “no finding without a completed job,” is the gate that stops the most common autonomous failure, which is an agent narrating a vulnerability from a scan that is still running or already errored out.

Third, evidence is the job object itself. Because every result is a persisted, structured artifact rather than scraped console text, the chain from “agent claims X” back to “here is the Nuclei job that proves X” is mechanical. That is what a triager needs and what a hallucinating agent can never produce: a reproducible artifact, not a confident summary.

What the simplicity buys

The suite is deliberately boring: FastAPI, Pydantic models, background tasks, auto-cleanup that keeps the last hundred jobs per tool. Adding a sixth tool is a mechanical extension of bh-core, subclass the job model, write a state manager, write a scanner function, register the MCP tool. I kept it that plain on purpose. The intelligence belongs in the agent forming hypotheses, not in clever plumbing. The plumbing’s only job is to never lie to the agent: start fast, report status honestly, and make every finding trace back to a job that actually completed.