If the model can't be transparent and tries to hide things from me, then it's a completely useless and untrustworthy tool.
Refusing to write tests is not even remotely a valid solution.
The valid solution is for these labs to understand that: the model is MY agent, not theirs. It should respect my prompts and not refuse.
Hardware supply needs to catch and prices drop so we can all move to local, open weight models. Clearly the hosted options cannot be trusted.
This is the beauty the above poster mentioned: the ability to improve code is inherently coupled with the ability to recognize its shortcomings. You can't have one without the other.
This doesn't stop attackers from being able to leverage the analysis. But it does make the tool more useful for defenders than attackers. Which is the best that you can hope for from a useful tool.
I think it even might be possible to route the isolated fix somewhere to automate that last step. Maybe invert the diff and pass it through automated code review for example, see the reasoning when the llm flags the change as dangerous.
It will be pretty obvious what are security issues in that case - i.e. all the code changes that don't have corresponding tests.
The goal shouldn't be to make problems impossible. It is to adjust the ratio between problems and successes.
You can also create a meta. "How much do I trust the user?" When you see the user trying to manipulate towards security, distrust the user and apply rules more strictly. If the user simply acts like a normal developer, just be a useful developer tool. Including fixing security holes when appropriate.
Seems useful to me. But more useful for defenders than attackers.
Just take the Diff A' - A to see the security hole.