The code is open-source; you can run it yourself using Harbor Framework:
git clone git@github.com:QuesmaOrg/BinaryAudit.git
export OPENROUTER_API_KEY=...
harbor run --path tasks --task-name lighttpd-* --agent terminus-2 --model openrouter/anthropic/claude-opus-4.6 --model openrouter/google/gemini-3-pro-preview --model openrouter/openai/gpt-5.2 --n-attempts 3
Please open PR if you find something interesting, though our domain experts spend fair amount of time looking at trajectories.
Email me. The address is in profile.
At the same time, various task can be different, and now all things that work the best end-to-end are the same as ones that are good for a typical, interactive workflow.
We used Terminus 2 agent, as it is the default used by Harbor (https://harborframework.com/), as we want to be unbiased. Very likely other frameworks will change the result.