upvote
> and an unwillingness to write a test case demonstrating the security angle of it.

If the model can't be transparent and tries to hide things from me, then it's a completely useless and untrustworthy tool.

Refusing to write tests is not even remotely a valid solution.

The valid solution is for these labs to understand that: the model is MY agent, not theirs. It should respect my prompts and not refuse.

Hardware supply needs to catch and prices drop so we can all move to local, open weight models. Clearly the hosted options cannot be trusted.

reply
The end result of that is that your model can't fix or acknowledge security issues for fear of disclosing them.

This is the beauty the above poster mentioned: the ability to improve code is inherently coupled with the ability to recognize its shortcomings. You can't have one without the other.

reply
What I suggested would allow it to fix the issues. Just not write a test that was directly usable as a security exploit.

This doesn't stop attackers from being able to leverage the analysis. But it does make the tool more useful for defenders than attackers. Which is the best that you can hope for from a useful tool.

reply
It hides the issue a bit. But if you ask for atomic security fixes and then stare at the diffs you have your vulnerability. There is just a bit more friction involved in the vulnerability => exploit path, but the root cause is unfixed.

I think it even might be possible to route the isolated fix somewhere to automate that last step. Maybe invert the diff and pass it through automated code review for example, see the reasoning when the llm flags the change as dangerous.

reply
> What I suggested would allow it to fix the issues. Just not write a test that was directly usable as a security exploit.

It will be pretty obvious what are security issues in that case - i.e. all the code changes that don't have corresponding tests.

reply
Right but the issue is users have full control over context. A security-violating action by a coding agent in one context can be completely innocuous under other contexts etc, or breaking down the task into multiple tasks that in isolation do not violate anything.
reply
Yes, there is always a path to a problem. Even random monkeys on a keyboard can write a security exploit. Random monkeys with guidance from a knowledgeable human will do it much faster.

The goal shouldn't be to make problems impossible. It is to adjust the ratio between problems and successes.

You can also create a meta. "How much do I trust the user?" When you see the user trying to manipulate towards security, distrust the user and apply rules more strictly. If the user simply acts like a normal developer, just be a useful developer tool. Including fixing security holes when appropriate.

reply
I think they were doing something like this, the tradeoff is that it's hard to do without an irritating number of false positives and/or wasting loads of precious tokens on useless audits.
reply
That would make the model useless
reply
How does this make the model useless? It finds and fixes the security hole. It can even write a test that verifies that the fix didn't break things. But it deliberately doesn't reveal the fact that it was a security issue that was fixed.

Seems useful to me. But more useful for defenders than attackers.

reply
Imagine that you have the repo A, ask the model to "fix the security issue" and end up with A'.

Just take the Diff A' - A to see the security hole.

reply