May even be an order of magnitude more
Ensuring code isn’t bad is the expensive part.
The definition of "bad" from a security PoV is rapidly expanding, in light of relatively new capabilities and increasingly cheap access to exploitable vulnerabilities.
Those costs can be extremely high.
I expect at some point formal verification will become more economical than red teaming. Writing it correctly is more expensive, but it may be cheaper than trying to secure incorrect software.
(Or rather, as hacking incorrect software becomes vastly cheaper, the amount of software worth writing properly will increase.)
I've been thinking, by Dijkstra's standards we have already been vibe coding for almost a century :)
The basic security flaws with regards to input validation and overflows should never ever be output by an AI. For "security flaws due to bad design" I'll cut them slack until AGI is achieved.
The most interesting security bugs have causes that are spread across large codebases, or networks of dependencies.
Training the AI to "output secure code" won't work if it doesn't also have access to the source code of every dependency that it's using... and even then, given current model speeds and prices most developers won't want to wait for an hour on every edit they make while the LLM reasons through all of the dependencies.
Vulnerability discovery has essentially moved to a "proof of work" computation model with AI that has some similarities to crypto like BTC or ethereum 1.0. I don't see any reason a well funded adversary couldn't use this same process on open-source code to develop exploits. I'm sure AI would be happy to try and create exploits from the results rather than fixes.
This sort of proof of work has a notable difference from crypto in the asymmetric nature of what each side is targeting. In crypto, each miner was attempting to find a solution to the same problem and they would all move on to a new one once a solution is found. However with AI vulnerability scanning, the non-deterministic nature means an adversary is likely to find different vulnerabilities. Even if it doesn't, the adversaries have a different post-discovery workflow (i.e. probably less compute intensive aka cheaper due to only needing one viable exploit to win) than the software maintainers do.
Considering it's possible both the adversary and their target could both do all this while running Claude puts Anthropic in a real "Merchant of Death" position.
The goal of AI-generated code should not be that one needs a AI-based security review tool on top of it, but that the AI-generated code in itself is reasonably secure.
It's an estimate, so it might be wrong, but it gives the ballpark based on our experience. Happy to hear everyone's feedback.
But even this larger number, in turn, can be about 1/10th the cost of a formal engagement to discover the type of findings it seems to be going for: things that do not show up from PR reviews or even /security-review without the pre-work steps in the open-source framework guided by an expert. That's not counting the time and delay to figure out how to do that engagement.
Bluntly: if it matters, while this is a month's vibing budget for a single scan, it is also "pennies on the dollar" dirt cheap.
At the same time, its findings still need an expert. Its suggestions may be helpful, they may be actively harmful, depends on the prework quality.
Recommendation to IT department heads: spend a couple grand on this, use the scare page to rustle up the budget to build a relationship with a red team that can find, triage, help remediate if needed, and train your in-house team to be "security minded".
Reminiscent of the early days of tax automation where importing a W2 cost hundreds of dollars until people realized typing in 6 boxes worth of data was easy and paying the automation fee ate up their entire tax return.
This doesn't make any sense cost-wise. It would be cheaper to just hire a security engineer.