undefined

points

[-]

I have been thinking about that lately and isn't testing and security evaluation way harder problem than designing and carefully implementing new features? I think that vibecoding automates easiest step in SW development while making more challenging/expensive steps harder. How are we suppose to debug complex problems in critical infrastructure if no one understands code? It is possible that in future agents will be able to do that but it feels to me that we are not there yet.

by bawolff2 days ago|

prev|

[-]

I dont think that will ever be possible.

At some point security becomes - the program does the thing the human wanted it to do but didn't realize they didn't actually want.

No amount of testing can fix logic bugs due to bad specification.

by skrtskrt1 days ago|

parent|

[-]

AI as advanced fuzz-testing is ridiculously helpful though - hardly any bug you can in this sort of advanced system is a specification logic bug. It's low-level security-based stuff, finding ways to DDOS a local process, or work around OS-level security restrictions, etc.

by bawolff1 days ago|

parent|

[-]

I'm kind of doubtful that AI is all that great at fuzz testing. Putting that aside though, we are talking about web browsers here. Security issues from bad specification or misunderstanding the specification is relatively common.

by thephyber1 days ago|

parent|

prev|

[-]

Re-read the thread you are replying to.

Each of the last 4 comments in your thread (including yours) are conflating what they mean by AI.

by skrtskrt20 hours ago|

parent|

[-]

You must be lost.

by falcor841 days ago|

parent|

prev|

[-]

Well, yes, agreed - that is the essential domain complexity.

But my argument is that we can work to minimize the time we spend on verifying the code-level accidental complexity.

by bawolff1 days ago|

parent|

[-]

Sure, but that is what we've been doing since the early 2000s (e.g. aslr, read only stacks, static analysis, etc).

And we've had some succeses, but i wouldn't expect any game changing breakthroughs any time soon.

by mort962 days ago|

prev|

[-]

I disagree. Thorough testing provides some level of confidence that the code is correct, but there's immense value in having infrastructure which some people understand because they wrote it. No amount of process around your vibe slop can provide that.

by px432 days ago|

parent|

[-]

That's just status quo, which isn't really holding up in the modern era IMO.

I'm sure we'll have vibed infrastructure and slow infrastructure, and one of them will burn down more frequently. Only time will tell who survives the onslaught and who gets dropped, but I personally won't be making any bets on slow infrastructure.

by falcor842 days ago|

parent|

prev|

[-]

I somewhat agree, but even then would argue that the proper level at which this understanding should reside is at the architecture and data flow invariants levels, rather than the code itself. And these can actually be enforced quite well as tests against human-authored diagrammatical specs.

by t435621 days ago|

parent|

[-]

If you don't fully understand the code how do you know it implements your architecture exactly and without doing it in a way that has implications you hadn't thought of?

As a trivial example I just found a piece of irrelevant crap in some code I generated a couple of weeks ago. It worked in the simple cases which is why I never spotted it but would have had some weird effects in more complicated ones. It was my prompting that didn't explain well enough perhaps but how was I to know I failed without reading the code?

by jbvlkt1 days ago|

parent|

[-]

Exactly. We do not have another artifact than code which can be deterministically converted to program. That is reason we have to still read the code. Prompt is not final product in development process.

by mort961 days ago|

parent|

prev|

[-]

I disagree. The code itself matters too.

by irishcoffee1 days ago|

parent|

prev|

[-]

Who is writing the tests? An LLM? If so, they have little value.