upvote
I can tell you how I am seeing agents be used with reasonable results. I will keep this high level. I don't rely on the agents solely. You build agents that augment your capabilities.

They can make diagrams for you, give you an attack surface mapping, and dig for you while you do more manual work. As you work on an audit you will often find things of interest in a binary or code base that you want to investigate further. LLMs can often blast through a code base or binary finding similar things.

I like to think of it like a swiss army knife of agentic tools to deploy as you work through a problem. They won't balk at some insanely boring task and that can give you a real speed up. The trick is if you fall into the trap of trying to get too much out of an LLM you end up pouring time into your LLM setup and not getting good results, I think that is the LLM productivity trap. But if you have a reasonable subset of "skills" / "agents" you can deploy for various auditing tasks it can absolutely speed you up some.

Also, when you have scale problems, just throw an LLM at it. Even low quality results are a good sniff test. Some of the time I just throw an LLM at a code review thing for a codebase I came across and let it work. I also love asking it to make me architecture diagrams.

reply
> But if you have a reasonable subset of "skills" / "agents" you can deploy for various auditing tasks it can absolutely speed you up some.

Are people sharing these somewhere?

reply
I put the terms in quotes because it can be as simple as a set of prompts you develop for various contexts. It really doesn't have to be too heavy of an idea.
reply
I think overall you're better off creating these yourself. The more you add to the overall context, the more chance of the model to screw up somewhere, so you want to give it as little as possible, yet still include everything that is important at that moment.

Using the agent and seeing where it get stuck, then creating a workflow/skill/whatever for how to overcome that issue, will also help you understand what scenarios the agents and models are currently having a hard time with.

You'll also end up with fewer workflows/skills that you understand, so you can help steer things and rewrite things when inevitably you're gonna have to change something.

reply
Oh, nice find... We end up using PyGhidra, but the models waste some cycles because of bad ergonomics. Perhaps your cli would be easier.

Still, Ghidra's most painful limitation was extremely slow time with Go Lang. We had to exclude that example from the benchmark.

reply
This is really cool! Thanks for sharing. It's a lot more sophisticated than what I did w/ Ghidra + LLMs.
reply
Thanks for sharing! It seems to be an active space, vide a recent MCP server (https://news.ycombinator.com/item?id=46882389). I you haven't tried, recommend a lot posting it as Show HN.

I tried a few approaches - https://github.com/jtang613/GhidrAssistMCP (was the harderst to set) Ghidra analyzeHeadless (GPT-5.2-Codex worked with it well!) and PyGhidra (my go-to). Did you try to see which works the best?

I mean, very likely (especially with an explicit README for AI, https://github.com/akiselev/ghidra-cli/blob/master/.claude/s...) your approach might be more convenient to use with AI agents.

reply
How does this approach compare to the various Ghidra MCP servers?
reply
There’s not much difference, really. I stupidly didn’t bother looking at prior art when I started reverse engineering and the ghidra-cli was born (along with several others like ilspy-cli and debugger-cli)

That said, it should be easier to use as a human to follow along with the agent and Claude Code seems to have an easier time with discovery rather than stuffing all the tool definitions into the context.

reply
That is pretty funny. But you probably learned something in implementing it! This is such a new field, I think small projects like this are really worthwhile :)
reply
I also did this approach (scripts + home-brew cli)...because I didn't know Ghidra MCP servers existed when I got started.

So I don't have a clear idea of what the comparison would be but it worked pretty well for me!

reply
[dead]
reply