upvote
It's becoming apparent that it requires more tokens to secure code than it does to write it

May even be an order of magnitude more

reply
In all seriousness, wasn’t that always the case? Writing bad code is relatively cheap.

Ensuring code isn’t bad is the expensive part.

reply
Sort of?

The definition of "bad" from a security PoV is rapidly expanding, in light of relatively new capabilities and increasingly cheap access to exploitable vulnerabilities.

reply
I don't think the definition of "bad" is expanding. Rather the ability to detect and exploit "bad" is.
reply
fair point. another way of putting it might be to say that, for all extant software, much more of it is "bad" than we realized even a month or two ago -- and the cost to create and maintain "good" software is increasing (even as the naive / surface-level / apparent cost is plummeting)
reply
Same thing happened with the growth of the internet. There was a time when there was basically no consideration of buffer overflow.
reply
For now, maybe, yes? But the most important targets of this kind of work aren't AI outputs; it's legacy code, particularly (but not exclusively) old memory-unsafe code. In those situations the figure of merit isn't the token cost of recreating the target code; it's the cost of finding the same bugs with humans or preexisting tools.

Those costs can be extremely high.

reply
Any newly produced AI code is immediately legacy and trash at the same time.
reply
There's a parallel between looking for bugs and mining. As models get smarter, they'll find "deeper bugs".

I expect at some point formal verification will become more economical than red teaming. Writing it correctly is more expensive, but it may be cheaper than trying to secure incorrect software.

(Or rather, as hacking incorrect software becomes vastly cheaper, the amount of software worth writing properly will increase.)

I've been thinking, by Dijkstra's standards we have already been vibe coding for almost a century :)

reply
Not if the original code is secure...
reply
deleted
reply
Are AI firms going to charge us to write code, and then charge us even more to secure it?!
reply
Yes, obviously. Infosec has always been plagued by this. How many services make you pay for SSO?
reply
Given the slop that's made its way to Github we can see that this is a great profit model. Ship slop and then "fix" slop. What an efficient use of our planet!
reply
It's weird because why can't they train the AI to simply output secure code?

The basic security flaws with regards to input validation and overflows should never ever be output by an AI. For "security flaws due to bad design" I'll cut them slack until AGI is achieved.

reply
> It's weird because why can't they train the AI to simply output secure code?

The most interesting security bugs have causes that are spread across large codebases, or networks of dependencies.

Training the AI to "output secure code" won't work if it doesn't also have access to the source code of every dependency that it's using... and even then, given current model speeds and prices most developers won't want to wait for an hour on every edit they make while the LLM reasons through all of the dependencies.

reply
deleted
reply
What's destabilizing the industry right now isn't vulnerabilities AI introduces into new code; it's a flood of sev:hi vulnerabilities in existing code, not introduced by AI but discovered by it.
reply
Agreed -- and, compounding the challenge, the flood of _reported_ high-sev CVEs is itself a kind of DDoS attack on maintainers.
reply
> What's destabilizing the industry right now isn't vulnerabilities AI introduces into new code; it's a flood of sev:hi vulnerabilities in existing code, not introduced by AI but discovered by it.

Vulnerability discovery has essentially moved to a "proof of work" computation model with AI that has some similarities to crypto like BTC or ethereum 1.0. I don't see any reason a well funded adversary couldn't use this same process on open-source code to develop exploits. I'm sure AI would be happy to try and create exploits from the results rather than fixes.

This sort of proof of work has a notable difference from crypto in the asymmetric nature of what each side is targeting. In crypto, each miner was attempting to find a solution to the same problem and they would all move on to a new one once a solution is found. However with AI vulnerability scanning, the non-deterministic nature means an adversary is likely to find different vulnerabilities. Even if it doesn't, the adversaries have a different post-discovery workflow (i.e. probably less compute intensive aka cheaper due to only needing one viable exploit to win) than the software maintainers do.

Considering it's possible both the adversary and their target could both do all this while running Claude puts Anthropic in a real "Merchant of Death" position.

reply
This doesn't make sense. Claude isn't creating the vulnerabilities. They've been here the whole time. You just get to know about them now.
reply
Even before that everybody was getting drowned in shitty reports from automated tools.

The goal of AI-generated code should not be that one needs a AI-based security review tool on top of it, but that the AI-generated code in itself is reasonably secure.

reply
[flagged]
reply
I think these audit tools can look beyond just security and can look for compliance audits as well. The ability to audit real targets in staging environments makes it easy to identify issues.
reply
I think that the cost of Opus is already prohibitively expensive, so not sure how that would compare to Mythos. Check this calculator- it shows that a company with 100 devs can hit ~2.5M cost on tokens annually, which is wild! https://ai-cost-calculator.arnica.io
reply
A 100 dev team is going to cost on the order of $25m a year (keep in mind cost is not just their salary, but also the HR/Management org to support a team of that size, benefits, office space, hardware/software). So if you think you get a 10% boost in productivity out of Opus, its not prohibitive at all.
reply
It's wild, but how many FLOPs in computation is occurring in those 2.5M in tokens doing? Might not sound quite as wild using that metric.
reply
Claude workflows in ultra code mode works in a very similar fashion and it consumes a moderate amount of the session usage limit, depending on the complexity of the task. With the API it would probably get expensive quickly though
reply
We actually created a calculator to estimate scanning costs (including whether you do this continuously or not) https://ai-cost-calculator.arnica.io

It's an estimate, so it might be wrong, but it gives the ballpark based on our experience. Happy to hear everyone's feedback.

reply
If you compare to their managed service, that estimate is likely 1/10th expectation, depending on codebase.

But even this larger number, in turn, can be about 1/10th the cost of a formal engagement to discover the type of findings it seems to be going for: things that do not show up from PR reviews or even /security-review without the pre-work steps in the open-source framework guided by an expert. That's not counting the time and delay to figure out how to do that engagement.

Bluntly: if it matters, while this is a month's vibing budget for a single scan, it is also "pennies on the dollar" dirt cheap.

At the same time, its findings still need an expert. Its suggestions may be helpful, they may be actively harmful, depends on the prework quality.

Recommendation to IT department heads: spend a couple grand on this, use the scare page to rustle up the budget to build a relationship with a red team that can find, triage, help remediate if needed, and train your in-house team to be "security minded".

reply
Just another example of an overextension of technology in a scenario where applying a proper harness would suffice.

Reminiscent of the early days of tax automation where importing a W2 cost hundreds of dollars until people realized typing in 6 boxes worth of data was easy and paying the automation fee ate up their entire tax return.

reply
I mean, you don't need to run it all the time, right? You do it once over your entire existing codebase to start and then once over the diff in your CI/CD pipeline when you make a new change. I'm sure it's not literally that simple but I doubt these need to churn 24/7/365 either.
reply
In the Mythos blogpost they revealed to run the model like a 1000 times on the same code-base maybe with slightly different prompt or temperature. That suggests it will just be pay to win. If the 'attacker' spends more money/tokens than the 'defender' you will eventually be outclassed.
reply
It's even worse, it's loot box style. Not pay to win, but pay to have the chance to win. The result will always be non-deterministic, so for some cases it can give you what you're looking for from the first time, or it can take 1000 tries.
reply
It’s never not been “loot box style”. None of your past hired security audits were guaranteed to catch all issues?
reply
You are supposed to run it on full codebase before any single PR gets merge.
reply
Companies don't make production pushes yearly. For many, it's two week sprints..and that's one project.

This doesn't make any sense cost-wise. It would be cheaper to just hire a security engineer.

reply
I agree the cost curve has shifted. But if we take the Mozilla team's Mythos report as a broad baseline, you need to hire something like 10 security engineers to equal the Mythos productivity. Put another way, everyone's under hiring security by a LOT right now, we just have been lucky enough to see similar under hiring on hackers.
reply
[flagged]
reply