A local-first reverse-engineering agent, and the honest limits of one
Somnus drives Ghidra, angr, Frida and AFL++ through a small local model to triage binaries, no API keys, no network. It works end-to-end on ret2win. Here is what that proves, and what it very much does not.
There are two ways to write about a project like this. One inflates a working prototype into a breakthrough. The other tells you exactly where the edge of the tested behaviour is. Reverse engineering punishes the first kind of writing, so this is the second kind.
What it does
Point Somnus at a binary and it runs a fixed pipeline, then hands control to the model:
- Triage: protections, imports, dangerous calls, a function inventory.
- Decompile: every function through Ghidra, cached per target so you pay the slow step once.
- Pattern match: the decompiled C is scanned for classic shapes: stack buffer overflow, format string, command injection, integer overflow.
- Reason: a local model reads a compacted preview and calls follow-up tools: zoom into one function, check reachability, propose a PoC shape.
- Persist: findings and artifacts go to SQLite, so a run is resumable and queryable instead of a wall of terminal scrollback.
The design constraint that shaped everything: no API keys, no network calls. A
binary you’re reversing is often something you can’t send to a cloud endpoint,
and a tool that assumes you can is a tool you can’t use on the interesting
targets. So the model is qwen3:8b over Ollama, small enough to run locally,
tool-use-capable enough to drive the loop.
The honest status
Somnus is verified end-to-end on ROP Emporium’s ret2win (x86_64). On that
target it correctly identifies the read() overflow in pwnme, resolves the
ret2win gadget address, and computes the 40-byte overflow offset. Start to
finish, no human in the loop for the analysis.
That is a real result. It is also a simple stack BOF in a CTF binary built to be solved, and generalisation beyond that class is not yet tested. I haven’t proven it on stripped production binaries, on heap bugs, on anything with real anti-analysis. Saying “it solves ret2win” is true; implying “it reverse-engineers arbitrary binaries” would not be.
Why local-first mattered more than I expected
The no-network constraint started as a privacy decision and turned into a design forcing function. It killed any temptation to lean on a giant model to paper over a weak pipeline, if the local 8B can’t follow the loop, the loop is wrong, not the model. The tools do the heavy lifting; the model’s job is narrow: read the preview, pick the next tool, decide when there’s enough to call a finding. That division of labour is the part I’d keep in the next version of this, whatever the model underneath turns out to be.