To aim for a meeting of the minds... Would you help me out and unpack what you mean so there is less ambiguity? This might be minor terminological confusion. It is possible we have different takes, though -- that's what I'm trying to figure out.
There are at least two senses of 'correctness' that people sometimes mean: (a) correctness relative to a formal spec: this is expensive but doable*; (b) confidence that a spec matches human intent: IMO, usually a messy decision involving governance, organizational priorities, and resource constraints.
Sometimes people refer to software correctness problems in a very general sense, but I find it hard to parse those. I'm familiar with particular theoretical results such as Rice's theorem and the halting problem that pertain to arbitrary programs.
* With tools like {Lean, Dafny, Verus, Coq} and in projects like {CompCert, sel4}.
Since nobody involved actually cares whether the code works or not, it doesn't matter whether it's a different wrong thing each time.
If anyone cared enough they could look at the code and see the problem immediately and with little effort, but we're encouraging a world where no one cares enough to put even that baseline effort because *gestures at* the tests are passing. Who cares how wrong the code is and in what ways if all the lights are green?
If the spec is so complete that it covers everything, you might as well write the code.
The benefit of writing a spec and having the LLM code it, is that the LLM will fill in a lot of blanks. And it is this filling in of blanks that is non-deterministic.
Welcome to the usual offshoring experience.
Except one shoe is made by children in a fire-trap sweatshop with no breaks, and the other was made by a well paid adult in good working conditions.
The ends don’t justify the means. The process of making impacts the output in ways that are subtle and important, but even holding the output as a fixed thing - the process of making still matters, at least to the people making it.
And guess how much shoe companies make who manufacture shoes in sweatshop conditions versus the ones who make artisanal handcrafted shoes?
Btw in my metaphor, we - the programmers - are the kids in the sweatshop.
Even on the BigTech side being able to reverse a btree on the whiteboard and having on your resume that you were a mid level developer isn’t enough either anymore
If you look at the comp on that side, it’s also stagnated for decade. AI has just accelerated that trend.
While my job has been at various percentages to produce code for 30 years, it’s been well over a decade since I had to sell myself on “I codez real gud”. I sell myself as a “software engineer” who can go from ambiguous business and technical requirements, deal with politics, XYProblems, etc
That’s exactly my point. “Programming” was clearly becoming commoditized a decade ago.
Out of bounds behavior is sometimes a known unknown, but in the era of generated code is exclusively unknown unknowns.
Good luck speccing out all the unanticipated side effects and undefined behaviors. Perhaps you can prompt the agent in a loop a bnumber of times but it's hard to believe that the brute-force throw-more-tokens-at-it approach has the same level of return as a more attentive audit by human eyeballs.
I don’t review every line of code by everyone whose output I’m responsible for, I ask them to explain how they did things and care about their testing, the functional and non functional requirements and hotspots like concurrency, data access patterns, architectural issues etc.
For instance, I haven’t done web development since 2002 except for a little copy and paste work. I completely vibe coded three internal web admin sites for separate projects and used Amazon Cognito for authentication. I didn’t look at a line of code that AI generated any more than I would have looked at a line of code for a website I delegated to the web developer. I cared about functionality and UX.
Being shoes, offshoring, Webwidgets or AI generated code.
But we’re the shoemakers, not the consumers. It’s actually our job to preserve our own and our peers quality of life.
Cheapest good option possible doesn’t have to be the sweatshop - tho the shareholders of nike or zara would have you believe that.
How you define your proof is up to you. It might be a simple test, or an exhaustive suite of tests, or a formal proof. It doesn't matter. If the output of the code is correct by your definition, then it doesn't matter what the underlying code actually is.