At some point security becomes - the program does the thing the human wanted it to do but didn't realize they didn't actually want.
No amount of testing can fix logic bugs due to bad specification.
Each of the last 4 comments in your thread (including yours) are conflating what they mean by AI.
But my argument is that we can work to minimize the time we spend on verifying the code-level accidental complexity.
And we've had some succeses, but i wouldn't expect any game changing breakthroughs any time soon.
I'm sure we'll have vibed infrastructure and slow infrastructure, and one of them will burn down more frequently. Only time will tell who survives the onslaught and who gets dropped, but I personally won't be making any bets on slow infrastructure.
As a trivial example I just found a piece of irrelevant crap in some code I generated a couple of weeks ago. It worked in the simple cases which is why I never spotted it but would have had some weird effects in more complicated ones. It was my prompting that didn't explain well enough perhaps but how was I to know I failed without reading the code?
>We do not need vibe-coded critical infrastructure.
I think when you have virtually unlimited compute, it affords the ability to really lock down test writing and code review to a degree that isn't possible with normal vibe code setups and budgets.
That said for truly critical things, I could see a final human review step for a given piece of generated code, followed by a hard lock. That workflow is going to be popular if it already isn't.
Perhaps part of a complex review chain for said function that's a few hundred LLM invocations total.
So long as there's a human reviewing it at the end and it gets locked, I'd argue it ultimately doesn't matter how the code was initially created.
There's a lot of reasons it would matter before it gets to that point, just more to do with system design concerns. Of course, you could also argue safety is an ongoing process that partially derives from system design and you wouldn't be wrong.
It occurred to me there's some recent prior art here:
https://news.ycombinator.com/item?id=47721953
It's probably fair to say the Linux kernel is critical infra, or at least a component piece in a lot of it.
In the not so distant future you'll probably be one of the few who haven't had their actual coding skills atrophy, and that's a good thing.
Hiring a few core devs to work on it should be a rounding error to Anthropic and a huge flex if they are actually able to deliver.
So, should I trust an LLM as much as a C compiler?
That's not true for coding in general. The best you can do is having unreasonably good test coverage, but the vast majority of code doesn't have that.
Servo may not be the best project for this experiment, as it has a strict no-AI contributions allowed policy.
It's the maintenance. The long term, slow burn, uninteresting work that must be done continually. Someone needs to be behind it for the long haul or it will never get adopted and used widely.
Right now, at least, LLMs are not great at that. They're great for quickly creating smaller projects. They get less good the older and larger those projects get.
https://x.com/mitchellh/status/2029348087538565612
Stuff like this where these models are root causing nontrivial large scale bugs is already there in SOTA.
I would not be surprised if next generation models can both resolve those more reliability and implement them better. At that point would be sufficiently good maintainers.
They are suggesting that new models can chain multiple newly discovered vulnerabilities into RCE and privilege escalations etc. You can't do this without larger scope planning/understanding, not reliabily.
Replicating Rust would also be a good one. There are many Rust-adjacent languages that ought to exist and would greatly benefit mankind if they were created.
I read the link twice and no AI or LLM mentioned. I don't know why people are so eager to chime in and try to steer the conversation towards AI.