upvote
That's what formal verification is about. I did some (using PSL for hardware verification); writing the formal spec is way harder than the actual code. It will find a lot of subtle issues, and you spend a most of the time deciding if it's the spec or the code that's wrong.

Having the code-writing part automated would have a negligible impact on the total project time.

reply
> humanic

No, thank you

reply
This is a task that humans are exceptionally bad at, because we are not computers. If something uses the right words in the right order such that it communicates the correct algorithm to a human, then a human is likely to say "yup, that's correct", even if an hour's study of these 15 lines reveals that a subtle punctuation choice, or a subtle mismatch between a function's name and its semantics, would reveal that it implements a different algorithm to the expected one.

LLMs do not understand prose or code in the same way humans do (such that "understand" is misleading terminology), but they understand them in a way that's way closer to fuzzy natural language interpretation than pedantic programming language interpretation. (An LLM will be confused if you rename all the variables: a compiler won't even notice.)

So we've built a machine that makes the kinds of mistakes that humans struggle to spot, used RLHF to optimise it for persuasiveness, and now we're expecting humans to do a good job reviewing its output. And, per Kernighan's law:

> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?

And that's the ideal situation where you're the one who's written it: reading other people's code is generally harder than reading your own. So how do you expect to fare when you're reading nobody's code at all?

reply
i meant on a higher, agentic level where the AI's code is infallible. and that's going to happen very soon:

say: human wants to make a search engine that money for them.

1. for a task, ask several agents to make their own implementation and a super agent to evaluate each one and interrogate each agent and find the best implementation/variable names, and then explain to the human what exactly it does. or just mythos

2. the feature is something like "let videos be in search results, along with links"

3. human's job "is it worth putting videos in this search engine? will it really drive profits higher? i guess people will stay on teh search engine longer, but hmmm maybe not. maybe let's do some a/b testing and see whether it's worth implementing???" etc...

this is where the developer has to start thinking like a product manager. meaning his position is abolished and the product manager can do the "coding" part directly.

now this should be basic knowledge in 2026. i am just reading and writing back the same thing on HN omds.

reply
The AI's code is not going to be infallible any time soon. It's been "very soon" for the past 4 years, and the AI systems are still making the same kinds of mistakes, which are the mistakes you'd expect from a first-principles study of their model architectures. There's no straightforward path to modifying the systems we have now, to make them infallible.
reply