At some point security becomes - the program does the thing the human wanted it to do but didn't realize they didn't actually want.
No amount of testing can fix logic bugs due to bad specification.
Each of the last 4 comments in your thread (including yours) are conflating what they mean by AI.
But my argument is that we can work to minimize the time we spend on verifying the code-level accidental complexity.
And we've had some succeses, but i wouldn't expect any game changing breakthroughs any time soon.
I'm sure we'll have vibed infrastructure and slow infrastructure, and one of them will burn down more frequently. Only time will tell who survives the onslaught and who gets dropped, but I personally won't be making any bets on slow infrastructure.
As a trivial example I just found a piece of irrelevant crap in some code I generated a couple of weeks ago. It worked in the simple cases which is why I never spotted it but would have had some weird effects in more complicated ones. It was my prompting that didn't explain well enough perhaps but how was I to know I failed without reading the code?
>We do not need vibe-coded critical infrastructure.
I think when you have virtually unlimited compute, it affords the ability to really lock down test writing and code review to a degree that isn't possible with normal vibe code setups and budgets.
That said for truly critical things, I could see a final human review step for a given piece of generated code, followed by a hard lock. That workflow is going to be popular if it already isn't.
Perhaps part of a complex review chain for said function that's a few hundred LLM invocations total.
So long as there's a human reviewing it at the end and it gets locked, I'd argue it ultimately doesn't matter how the code was initially created.
There's a lot of reasons it would matter before it gets to that point, just more to do with system design concerns. Of course, you could also argue safety is an ongoing process that partially derives from system design and you wouldn't be wrong.
It occurred to me there's some recent prior art here:
https://news.ycombinator.com/item?id=47721953
It's probably fair to say the Linux kernel is critical infra, or at least a component piece in a lot of it.
In the not so distant future you'll probably be one of the few who haven't had their actual coding skills atrophy, and that's a good thing.
Hiring a few core devs to work on it should be a rounding error to Anthropic and a huge flex if they are actually able to deliver.
So, should I trust an LLM as much as a C compiler?
That's not true for coding in general. The best you can do is having unreasonably good test coverage, but the vast majority of code doesn't have that.