If it's not worth reading something where the writer didn't take the time to write it, by extension that means nobody read the code.
Which means nobody understands it, beyond the external behaviour they've tested.
I'd have some issues with using such software, at least where reliability matters. Blackbox testing only gets you so far.
But I guess as opposed to other types of writing, developers _do_ read generated code. At least as soon as something goes wrong.
We tell stories of Therac 25 but 90% of software out there doesn’t kill people. Annoys people and wastes time yes, but reliability doesn’t matter as much.
E-mail, internet and networking, operations on floating point numbers are only kind of somewhat reliable. No one is saying they will not use email because it might not be delivered.
As we give more and more autonomy to agents, that % may change. Just yesterday I was looking at hexapods and the first thing it tells you ( with a disclaimer its for competitions only ) that it has a lot of space for weapon install. I had to briefly look at the website to make sure I did not accidentally click on some satirical link.
I gotta disagree with you there! Code that isn't read doesn't do anything. Code must be read to be compiled, it must be read to be interpreted, etc.
I think this points to a difference in our understanding of "read" means, perhaps? To expand my pithy "not gonna read if you didn't write" bit: The idea that code stands on its own is a lie. The world changes around code and code must be changed to keep up with the world. Every "program" (is the git I run the same as the git you run?) is a living document that people maintain as need be. So when we extend the "not read / didn't write" it's not using the program (which I guess is like taking the lessons from a book) it's maintaining the program.
So I think it's possible that I could derive benefit from someone else reading an llm's text output (they get an idea) - but what we are trying to talk about is the work of maintaining a text.
For example, I have a few letter generators on my website. The letters are often verified by a lawyer, but the generator could totally be vibe-coded. It's basically an HTML form that fills in the blanks in the template. Other tools are basically "take input, run calculation, show output". If I can plug in a well-tested calculation, AI could easily build the rest of the tool. I have been staunchly against using AI in my line of work, but this is an acceptable use of it.
But isn't this the distinction that language models are collapsing? There are 'prose' prompt collections that certainly make (programmatic) things happen, just as there is significant concern about the effect of LLM-generated prose on social media, influence campaigns, etc.
"One" is the operative word here, supposing this includes only humans and excludes AI agents. When code is executed, it does get read (by the computer). Making that happen is a conscious choice on the part of a human operator.
The same kind of conscious choice can feed writing to an LLM to see what it does in response. That is much the same kind of "execution", just non-deterministic (and, when given any tools beyond standard input and standard output, potentially dangerous in all the same ways, but worse because of the nondeterminism).
And to answer you more directly, generally, in my professional world, I don't use closed source software often for security reasons, and when I do, it's from major players with oodles of more resources and capital expenditure than "some guy with a credit card paid for a gemini subscription."
I got Claude to make a test suite the other day for a couple RFCs so I could check for spec compliance. It made a test runner and about 300 tests. And an html frontend to view the test results in a big table. Claude and I wrote 8500 lines of code in a day.
I don’t care how the test runner works, so long as it works. I really just care about the test results. Is it finding real bugs? Well, we went though the 60 or so failing tests. We changed 3 tests, because Claude had misunderstood the rfc. The rest were real bugs.
I’m sure the test runner would be more beautiful if I wrote it by hand. But I don’t care. I’ve written test runners before. They’re not interesting. I’m all for beautiful, artisanal code. I love programming. But sometimes I just want to get a job done. Sometimes the code isn’t for reading. It’s for running.
It works, sure, but is it worth your time to use? I think a common blind spot for software engineers is understanding how hard it is to get people to use software they aren’t effectively forced to use (through work or in order to gain access to something or ‘network effects’ or whatever).
Most people’s time and attention is precious, their habits are ingrained, and they are fundamentally pretty lazy.
And people that don’t fall into the ‘most people’ I just described, probably won’t want to use software you had an LLM write up when they could have just done it themselves to meet their exact need. UNLESS it’s something very novel that came from a bit of innovation that LLMs are incapable of. But that bit isn’t what we are talking about here, I don’t think.
This is something I like about the LLM future. I get to spend my time with users thinking about their needs and how the product itself could be improved. The AI can write all the CSS and sql queries or whatever to actually implement those features.
If the interesting thing about software is the code itself - like the concepts and so on, then yeah do that yourself. I like working with CRDTs because they’re a fun little puzzle. But most code isn’t like that. Most code just needs to move some text from over here to over there. For code like that, it’s the user experience that’s interesting. I’m happy to offload the grunt work to Claude.
Sure... to a point. But realistically, the "use an LLM to write it yourself" approach still entails costs, both up-front and on-going, even if the cost may be much less than in the past. There's still reason to use software that's provided "off the shelf", and to some extent there's reason to look at it from a "I don't care how you wrote it, as long as it works" mindset.
came from a bit of innovation that LLMs are incapable of.
I think you're making an overly binary distinction on something that is more of a continuum, vis-a-vis "written by human vs written by LLM". There's a middle ground of "written by human and LLM together". I mean, the people building stuff using something like SpecKit or OpenSpec still spend a lot of time up-front defining the tech stack, requirements, features, guardrails, etc. of their project, and iterating on the generated code. Some probably even still hand tune some of the generated code. So should we reject their projects just because they used an LLM at all, or ?? I don't know. At least for me, that might be a step further than I'd go.
Absolutely, but I’d categorize that ‘bit’ as the innovation from the human. I guess it’s usually just ongoing validation that the software is headed down a path of usefulness which is hard to specify up-front and by definition something only the user (or a very good proxy) can do (and even they are usually bad at it).
Agreed.
I see a lot of these discussions where a person gets feelings/feels mad about something and suddenly a lot of black and white thinking starts happening. I guess that's just part of being human.
Although I love science, I'm much happier building programs. "Does the program do what the client expects with reasonable performance and safety? Yes? Ship it."
I wonder if this is a major differentiator between AI fans and detractors. I dislike and actively avoid anything closed source. I fully agree with the premise of the submission as well.
this is the literary equivalent of compiling and running the code.
Okay but it is probably not going to be a tool that will be reliable or work as expected for too long depending on how complex it is, how easily it can be understood, and how it can handle updates to libraries, etc. that it is using.
Also, what is our trust with this “tool”? E.g. this is to be used in a brain surgery that you’ll undergo, would you still be fine with using something generated by AI?
Earlier you couldn’t even read something it generated, but we’ll trust a “tool” it created because we believe it works? Why do we believe it will work? Because a computer created it? That’s our own bias towards computing that we assume that it is impartial but this is a probabilistic model trained on data that is just as biased as we are.
I cannot imagine that you have not witnessed these models creating false information that you were able to identify. Understanding their failure on basic understandings, how then could we trust it with engineering tasks? Just because “it works”? What does that mean and how can we be certain? QA perhaps but ask any engineer here if companies are giving a single shit about QA while they’re making them shove out so much slop, and the answer is going to be disappointing.
I don’t think we should trust these things even if we’re not developers. There isn’t anyone to hold accountable if (and when) things go wrong with their outputs.
All I have seen AI be extremely good at is deceiving people, and that is my true concern with generative technologies. Then I must ask, if we know that its only effective use case is deception, why then should I trust ANY tool it created?
Maybe the stakes are quite low, maybe it is just a video player that you use to watch your Sword and Sandal flicks. Ok sure, but maybe someone uses that same video player for an exoscope and the data it is presenting to your neurosurgeon is incorrect causing them to perform an action they otherwise would have not done if provided with the correct information.
We should not be so laissez-faire with this technology.