upvote
> Expecting the model to do everything by itself is unrealistic

Well that’s the pitch.

reply
Is it? Aren't most edge LLM capabilities determined by specialized harnesses?
reply
Thank you for your note! As I mention in the post this is not scientific at all.

I'm very curious how you would do multiple runs of multiple models in a "work alongside the model" manner?

reply
Discovering vulnerabilities is a highly creative task, it's when you explore unsual paths that you discover atttack angles. Some bugs are simple, other are a complex orchestration of many factors.

By "Working with the model", is essentially reading the ouput of prompts and pointing in a direction just to decide the next steps. You could try to increase the prompt limit and create an agent that explores multiples directions in a DFS manner.

The issue with vulnerabilities is the agent not knowing when to stop because it's hard to validade if you reach the final result or not. I get amazing result when I code with AI, letting the AI go wild is just a waste a time and tokens.

I recommend you to read the write up on the crackme (https://crackmes.one/crackme/698f40f1e2ba6023bfacaa82), I think most experience developers would need, at least, 2 months of learning reverse engineering techiques to hopefully crack this one. GLM 5.1 manage to solve it, it didn't "copy pasted" any answer from it's training data. It did a binary analysis, anti debug patching, patching binaries, debugging memory during runtime etc. It only took about 20 minutes.

After seeing what GLM did, I do believe Anthropic concerns about Mythos are real. Cracking software just became a lot easier, too easy for my taste. Video games cheats will be the norm, cracked desktop apps without licenses and infected with malware. It's not a new thing but it just became too easy.

reply
Thank you so much for this detailed answer!! Excited to dig into this world more :)
reply
Maybe have a second model that is configured to nudge the first model in the direction of exploration, and have the two of them work in tandem?
reply
>>I've used glm 5.1 on fairly advanced crackme challenges

which have most likely been trained on, so all you did was regurgitate someone elses solution

reply
Anthropic made their models very averse to reverse engineering and vulnerability research chores. It is a difficult problem, but attackers will use models like GLM and defenders will be stuck with security engineering averse models.
reply
Claude used to be good with CTFs, but they added tons of guard rails lately and now it just says "Sorry, I can't help with anything to do with that"
reply
You have to do what I call "Manhattan Project" them. You can almost always evade the controls by carefully prompting them. It just wastes effort and time you should be spending doing other things in an LLM workflow. Essentially, there is almost no single discrete piece of a reverse engineering or CTF process that you can't get Claude to do, you just have to isolate it adequately and avoid letting it use names that attenuate it towards "this is an exploit" or "this is reverse engineering". I have not found a task I could not convince Claude to do. You can also fill the context window up with badgering it and eventually it is likely to simply let you through if you are careful, most of the safe guards are not deterministic.
reply
Sorry, Dave. I can't do that.
reply
[dead]
reply