A toolkit per target, not a scanner per program

The promise of a universal scanner is that you point it at anything and it finds the bugs. The reality is that it finds the intersection of bugs across all targets, which is the shallow set everyone else already scanned for. The interesting vulnerabilities live in the seams specific to one target: the way this particular gateway proxies to that particular internal service, the version of a component nobody else is running, the assumption two systems make about each other. A template cannot encode a seam it does not know exists.

Why the generic template loses

A scanner template is a fixed list of checks frozen before it ever saw your target. It cannot reason that the presence of vendor X in front of framework Y implies a specific request-parsing disagreement worth probing. It fires every payload at every endpoint, generating volume that you then have to triage, and volume is not signal. The worst part is the false confidence: a clean scanner run reads as “nothing here,” when it actually means “nothing from my fixed list.” Those are different statements and only the second one is true.

Build the toolkit after recon, shaped to what you found

The sequence is: detect first, then build probes for what you detected. If recon surfaces a reverse proxy in front of microservices, I write a small probe that injects traversal into the path segments that look like upstream service names, because that is where this architecture leaks. If I find a specific component version, the probe targets the behavior that version actually has, not a generic CVE pattern. Each probe is ten or thirty lines, cut to one hypothesis about one seam. None of it is reusable across the next target, and that is fine. Cheap to write, sharp where it matters, thrown away after.

Shape the output for the thing that has to read it

When an agent drives the probes, output format stops being cosmetic and becomes the bottleneck. A probe that dumps a full response body burns the context window on noise and starves the reasoning step. So I shape output deliberately: a one line verdict, the load-bearing delta (status code, a header that changed, the byte that differs between two requests), and the raw evidence behind a fold the agent opens only when it decides to. The discipline is the same one that keeps a human triager reading: surface the signal, hide the bulk, never make the reader reconstruct the finding from a wall of text.

plaintext

probe: path-traversal-via-proxy-segment
verdict: REACHED internal /admin/health via /api/{svc}/../admin/health
delta: 200 with internal build header X-Build-Internal present
evidence: <folded raw request/response>

Use real out-of-band sinks, not hopeful reflection

A probe that only checks for reflected output silently fails on the entire class of blind bugs: server-side request forgery with no response body, blind injection that processes but never echoes, anything where the interesting effect happens out of band. So the toolkit wires probes to a real out-of-band sink, a collaborator endpoint that records the callback, so a payload that produces no visible change in the response still proves itself when the target reaches out to a host only my payload could have named. Half of high-severity findings are invisible in the response and obvious in the sink.

Do not ship toy proof of concept

The last rule is the one that separates a finding that gets paid from one that gets closed. A proof that injects alert(1) into a field you control, or reaches an endpoint that returns your own data, demonstrates nothing. The probe has to drive the hypothesis to actual impact: not “the parameter is reflected” but “here is the cross-account data it exposes,” not “the request is proxied” but “here is the internal endpoint it reached that the public API never meant to expose.” A toy proof of concept tells the triager you stopped one step short of knowing whether it was real. The target-shaped toolkit exists precisely to take that last step, because that step is the entire finding.