undefined

[-]

As I see it, the focus should not be about the coding, but about the testing, and particularly the security evaluation. Particularly for critical infrastructure, I would want us to have a testing approach that is so reliable that it wouldn't matter who/what wrote the code.

by jbvlkt1 days ago|

[-]

I have been thinking about that lately and isn't testing and security evaluation way harder problem than designing and carefully implementing new features? I think that vibecoding automates easiest step in SW development while making more challenging/expensive steps harder. How are we suppose to debug complex problems in critical infrastructure if no one understands code? It is possible that in future agents will be able to do that but it feels to me that we are not there yet.

by bawolff2 days ago|

[-]

I dont think that will ever be possible.

At some point security becomes - the program does the thing the human wanted it to do but didn't realize they didn't actually want.

No amount of testing can fix logic bugs due to bad specification.

by skrtskrt1 days ago|

[-]

AI as advanced fuzz-testing is ridiculously helpful though - hardly any bug you can in this sort of advanced system is a specification logic bug. It's low-level security-based stuff, finding ways to DDOS a local process, or work around OS-level security restrictions, etc.

by bawolff1 days ago|

[-]

I'm kind of doubtful that AI is all that great at fuzz testing. Putting that aside though, we are talking about web browsers here. Security issues from bad specification or misunderstanding the specification is relatively common.

by thephyber1 days ago|

[-]

Re-read the thread you are replying to.

Each of the last 4 comments in your thread (including yours) are conflating what they mean by AI.

by skrtskrt20 hours ago|

[-]

You must be lost.

by falcor842 days ago|

[-]

Well, yes, agreed - that is the essential domain complexity.

But my argument is that we can work to minimize the time we spend on verifying the code-level accidental complexity.

by bawolff1 days ago|

[-]

Sure, but that is what we've been doing since the early 2000s (e.g. aslr, read only stacks, static analysis, etc).

And we've had some succeses, but i wouldn't expect any game changing breakthroughs any time soon.

by mort962 days ago|

[-]

I disagree. Thorough testing provides some level of confidence that the code is correct, but there's immense value in having infrastructure which some people understand because they wrote it. No amount of process around your vibe slop can provide that.

by px432 days ago|

[-]

That's just status quo, which isn't really holding up in the modern era IMO.

I'm sure we'll have vibed infrastructure and slow infrastructure, and one of them will burn down more frequently. Only time will tell who survives the onslaught and who gets dropped, but I personally won't be making any bets on slow infrastructure.

by falcor842 days ago|

[-]

I somewhat agree, but even then would argue that the proper level at which this understanding should reside is at the architecture and data flow invariants levels, rather than the code itself. And these can actually be enforced quite well as tests against human-authored diagrammatical specs.

by t435621 days ago|

[-]

If you don't fully understand the code how do you know it implements your architecture exactly and without doing it in a way that has implications you hadn't thought of?

As a trivial example I just found a piece of irrelevant crap in some code I generated a couple of weeks ago. It worked in the simple cases which is why I never spotted it but would have had some weird effects in more complicated ones. It was my prompting that didn't explain well enough perhaps but how was I to know I failed without reading the code?

by jbvlkt1 days ago|

[-]

Exactly. We do not have another artifact than code which can be deterministically converted to program. That is reason we have to still read the code. Prompt is not final product in development process.

[-]

I disagree. The code itself matters too.

by irishcoffee1 days ago|

[-]

Who is writing the tests? An LLM? If so, they have little value.

by rl31 days ago|

[-]

>> ...give him unlimited model access

>We do not need vibe-coded critical infrastructure.

I think when you have virtually unlimited compute, it affords the ability to really lock down test writing and code review to a degree that isn't possible with normal vibe code setups and budgets.

That said for truly critical things, I could see a final human review step for a given piece of generated code, followed by a hard lock. That workflow is going to be popular if it already isn't.

[-]

The availability or lack thereof of compute has absolutely nothing to do with my opinion. More vibe coded tests doesn't fix the problem.

by rl31 days ago|

https://news.ycombinator.com/item?id=47721953

[-]

It might when an individual function has 50 different models reviewing it, potentially multiple times each.

Perhaps part of a complex review chain for said function that's a few hundred LLM invocations total.

So long as there's a human reviewing it at the end and it gets locked, I'd argue it ultimately doesn't matter how the code was initially created.

There's a lot of reasons it would matter before it gets to that point, just more to do with system design concerns. Of course, you could also argue safety is an ongoing process that partially derives from system design and you wouldn't be wrong.

It occurred to me there's some recent prior art here:

It's probably fair to say the Linux kernel is critical infra, or at least a component piece in a lot of it.

[-]

I do not care how strong your vibes are and how many claudes you have producing slop and reviewing each others' slop. I do not think vibe coding is appropriate for critical infrastructure. I don't understand why you think telling me you'd have more slop would make me appreciate it more.

by rl31 days ago|

[-]

Fair enough. I respect the commitment to purity.

In the not so distant future you'll probably be one of the few who haven't had their actual coding skills atrophy, and that's a good thing.

[-]

A terrifying thought but not implausible. IMO, the world needs more people with a deep understanding of how stuff works, but that's not the direction we're moving in.

by rafaelmn2 days ago|

[-]

If you're trusting core contributors without AI I don't see why you wouldn't trust them with it.

Hiring a few core devs to work on it should be a rounding error to Anthropic and a huge flex if they are actually able to deliver.

[-]

I trust people to understand the code they write. I don't trust them to understand code they didn't write.

by weregiraffe1 days ago|

[-]

So you don't trust projects with more than one author? By definition, they'd have to understand each other's code.

[-]

Different people can understand different parts of the code.

by t435621 days ago|

[-]

It's extremely tempting to write stuff and not bother to understand it similar to the way most of us don't decompile our binaries and look at the assembler when we write C/C++.

So, should I trust an LLM as much as a C compiler?

by jddj1 days ago|

[-]

What if it impairs judgement?

by andai1 days ago|

[-]

They're getting really good at proofs and theorems, right?

by IshKebab1 days ago|

[-]

Proofs/theorems and memory safety vulnerabilities are a special case because there's an easy way to verify whether the model is bullshitting or not.

That's not true for coding in general. The best you can do is having unreasonably good test coverage, but the vast majority of code doesn't have that.

by scrame2 days ago|

[-]

Unfortunately we're going to get it whether or not we need it.

by teaearlgraycold1 days ago|

[-]

Well if the big players want to tell me their models are nearly AGI they need to put up or shut up. I don't want a stochastically downloaded C compiler. I want tech that improves something.

by nicoburns2 days ago|

[-]

> show us servo contrib log or something like that

Servo may not be the best project for this experiment, as it has a strict no-AI contributions allowed policy.

by Night_Thastus1 days ago|

[-]

The problem with such infrastructure is not the initial development overhead.

It's the maintenance. The long term, slow burn, uninteresting work that must be done continually. Someone needs to be behind it for the long haul or it will never get adopted and used widely.

Right now, at least, LLMs are not great at that. They're great for quickly creating smaller projects. They get less good the older and larger those projects get.

by rafaelmn1 days ago|

https://x.com/mitchellh/status/2029348087538565612

[-]

I mean the claim is that next generation models are better and better at executing on larger context. I find that GPT 5.4 xhigh is surprisingly good at analysis even on larger codebases.

Stuff like this where these models are root causing nontrivial large scale bugs is already there in SOTA.

I would not be surprised if next generation models can both resolve those more reliability and implement them better. At that point would be sufficiently good maintainers.

They are suggesting that new models can chain multiple newly discovered vulnerabilities into RCE and privilege escalations etc. You can't do this without larger scope planning/understanding, not reliabily.

by beepbooptheory1 days ago|

[-]

Oh good, I was worried for a sec that people wouldn't be talking about AI in this thread.

by andai1 days ago|

[-]

Replicating Chromium as a benchmark? ;)

Replicating Rust would also be a good one. There are many Rust-adjacent languages that ought to exist and would greatly benefit mankind if they were created.

by dabinat1 days ago|

[-]

The true solution to this is to fund things that are important, especially when billion-dollar companies are making a fortune from them.

by manx2 days ago|

[-]

Agreed. Which other software does society need badly?

by raincole1 days ago|

[-]

Perhaps, you know, not every thing, especially not every thread on HN, has to be about AI?

I read the link twice and no AI or LLM mentioned. I don't know why people are so eager to chime in and try to steer the conversation towards AI.

by rafaelmn1 days ago|