upvote
In the Mythos blogpost they revealed to run the model like a 1000 times on the same code-base maybe with slightly different prompt or temperature. That suggests it will just be pay to win. If the 'attacker' spends more money/tokens than the 'defender' you will eventually be outclassed.
reply
It's even worse, it's loot box style. Not pay to win, but pay to have the chance to win. The result will always be non-deterministic, so for some cases it can give you what you're looking for from the first time, or it can take 1000 tries.
reply
It’s never not been “loot box style”. None of your past hired security audits were guaranteed to catch all issues?
reply
You are supposed to run it on full codebase before any single PR gets merge.
reply
Companies don't make production pushes yearly. For many, it's two week sprints..and that's one project.

This doesn't make any sense cost-wise. It would be cheaper to just hire a security engineer.

reply
I agree the cost curve has shifted. But if we take the Mozilla team's Mythos report as a broad baseline, you need to hire something like 10 security engineers to equal the Mythos productivity. Put another way, everyone's under hiring security by a LOT right now, we just have been lucky enough to see similar under hiring on hackers.
reply