Read through the comments here and mentally replace "journalist" with "developer" and wonder about the standards and expectations in play.
Food for thought on whether the users who rely on our software might feel similarly.
There's many places to take this line of thinking to, e.g. one argument would be "well, we pay journalists precisely because we expect them to check" or "in engineering we have test-suites and can test deterministically", but I'm not sure if any of them hold up. The "the market pays for the checking" might also be true for developers reviewing AI code at some point, and those test-suites increasingly get vibed and only checked empirically, too.
Super interesting to compare.
- A rough equivalent here would be Windows shipping an update that bricks your PC or one of its basic features, which draws plenty of outrage. In both cases, the vendor shipped a critical flaw to production: factual correctness is crucial in journalism, and a quote is one of the worst things to get factually incorrect because it’s so unambiguous (inexcusable) and misrepresents who’s quoted (personal).
I’m 100% ok with journalists using AI as long as their articles are good, which at minimum requires factual correctness and not vacuous. Likewise, I’m 100% ok with developers using AI as long as their programs are good, which at minimum requires decent UX and no major bugs.
So how is the "output" checked then? Part of the assumption of the necessity of code review in the first place is that we can't actually empirically test everything we need to. If the software will programmatically delete the entire database next Wednesday, there is no way to test for that in advance. You would have to see it in the code.
If a journalist has little information and uses an llm to make "something from nothing" that's when I take issue because like, what's the point?
Same thing as when I see managers dumping giant "Let's go team!!! 11" messages splattered with AI emoji diarrhea like sprinkles on brown frosting. I ain't reading that shit; could've been a one liner.
Even an (unreliable) LLM overview can be useful, as long as you check all facts with real sources, because it can give the framing necessary to understand the subject. For example, asking an LLM to explain some terminology that a source is using.
I would expect there is literally zero overlap between the "professionals"[1] who say "don't look at the code" and the ones criticising the "journalists"[2]. The former group tend to be maximalists and would likely cheer on the usage of LLMs to replace the work of the latter group, consequences be damned.
[1] The people that say this are not professional software developers, by the way. I still have not seen a single case of any vibe coder who makes useful software suitable for deployment at scale. If they make money, it is by grifting and acting as an "AI influencer", for instance Yegge shilling his memecoin for hundreds of thousands of dollars before it was rugpulled.
[2] Somebody who prompts an LLM to produce an article and does not even so much as fact-check the quotations it produces can clearly not be described as a journalist, either.
E.g you technically don't need to look at the code if it's frontend code and part of the product is a e2e test which produces a video of the correct/full behavior via playwright or similar.
Same with backend implementations which have instrumentation which expose enough tracing information to determine if the expected modules were encountered etc
I wouldn't want to work with coworkers which actually think that's a good idea though
And that's ignoring that your statement technically isn't even true, because the engineers actually working in such fields are very few (i.e. designing bridges, airplanes etc).
The majority of them design products where safety isn't nearly as high stakes as that... And they frequently do overspec (wasting money) or underspec (increasing wastage) to boot.
This point has been severely overstated on HN, honestly.
Sorry, but had to get that off my chest.
The electrical engineers at my employer that design building electrical distribution systems have software that handles all of the calculations, it’s just math. Arc flash hazard analysis, breaker coordination studies, available fault current, etc. All manufacturers provide the data needed to perform these calculations for their products.
Other engineering disciplines have similar tools. Mechanical, civil, and structural engineers all use software that simulates their designs.
Are you sure? Simulators and prototypes abound. By the time you’re building the real, it’s more like rehearsal and solving a fe problems instead of every intricacy in the formula.
Nothing new here, in software. What is new, is that AI is allowing dependency hell to be experienced by many other vocations.
[0]: https://arstechnica.com/civis/threads/journalistic-standards...
All threads have since been locked:
https://arstechnica.com/civis/threads/journalistic-standards...
https://arstechnica.com/civis/threads/is-there-going-to-be-a...
https://arstechnica.com/civis/threads/um-what-happened-to-th...
The sad thing is, I don't know of anywhere else that comes close to what Ars was before.
I'm genuinely asking - I subscribe to Ars - if their response isn't best-case, where could I even even switch my subscription and RSS feed to?
Printing hallucinated quotes is a huge shock to their credibility, AI or not. Their credibility was already building up after one of their long time contributors, a complete troll of a person that was a poison on their forums, went to prison for either pedophilia or soliciting sex from a minor.
Some serious poor character judgement is going on over there. With all their fantastic reporters I hope the editors explain this carefully.
Don't you mean diminishing or disappearing instead of building up?
Building up sounds like the exact opposite of what I think you're meaning. ;)
This isn't exactly a new problem we do it with any bit of new software/hardware, not just LLMs. We check its work when it's new, and then tend to trust it over time as it proves itself.
But it seems to be hitting us worse with LLMs, as they are less consistent than previous software. And LLM hallucinations are partially dangerous, because they are often plausible enough to pass the sniff test. We just aren't used to handling something this unpredictable.
What the OP pointed out is a fact of life.
We do many things to ensure that humans don’t get “routine fatigue”- like pointing at each item before a train leaves the station to ensure you don’t eyes glaze over during your safety check list.
This isn’t an excuse for the behavior. Its more about what the problem is and what a corresponding fix should address.
I think it slips because the consequences of sloppy journalism aren’t immediately felt. But as we’re witnessing in the U.S., a long decay of journalistic integrity contributes to tremendous harm.
It used to be that to be a “journalist” was a sacred responsibility. A member of the Fourth Estate, who must endeavour to maintain the confidence of the people.
Its like seeing a dog play basketball badly. You're too stunned to be like "no don't sign him to <home team>".
But at the same time, doing that makes it even more likely the human in the loop will get sloppy, because there'll be even fewer cases where their input is actually needed.
I'm wondering if you need to start inserting intentional canaries to validate if humans are actually doing sufficiently torough reviews.
https://web.archive.org/web/20260213211721/https://arstechni...
>Scott Shambaugh here. None of the quotes you attribute to me in the second half of the article are accurate, and do not exist at the source you link. It appears that they themselves are AI hallucinations. The irony here is fantastic.
Instead of cross-checking the fake quotes against the source material, some proud Ars Subscriptors proceed to defend Condé Nast by accusing Scott of being a bot and/or fake account.
EDIT: Page 2 of the forum thread is archived too. This poster spoke too soon:
>Obviously this is massive breach of trust if true and I will likely end my pro sub if this isnt handled well but to the credit of ARS, having this comment section at all is what allows something like this to surface. So kudos on keeping this chat around.
It's that important.
This is what the author actually speculated may have occurred with Ars. Clearly something was lacking in the editorial process though that such things weren't human verified either way.
How do you know quantum physics is real? Or radio waves? Or just health advice? We don't. We outsource our thinking around it to someone we trust, because thinking about everything to its root source would leave us paralyzed.
Most people seem to have never thought about the nature of truth and reality, and AI is giving them a wake-up call. Not to worry though. In 10 years everyone will take all this for granted, the way they take all the rest of the insanity of reality for granted.
"...it illustrates exactly the kind of unsupervised output that makes open source maintainers wary."
followed later on by
"[It] illustrates exactly the kind of unsupervised behavior that makes open source maintainers wary of AI contributions in the first place."
The utility is that the infrenced output tends to be right much more often than wrong for mainstream knowledge.
Misquotes and fabricated quotes have existed long before AI, And indeed, long before computers.
So you STILL have not read the original blog post. Please stop bickering until AFTER you have at least done that bare minimum of trivial due diligence. I'm sorry if it's TL;DR for you to handle, but if that's the case, then TL;DC : Too Long; Don't Comment.
I read the article.
My claim is as it has always been. If we accept that the misquotes exist it does not follow that they were caused by hallucinations? To tell that we would still need additional evidence. The logical thing to ask would be; Has it been shown or admitted that the quotes were hallucinations?
Then you would be fully aware that the person who the quotes are attributed to has stated very clearly and emphatically that he did not say those things.
Are you implying he is an untrustworthy liar about his own words, when you claim it's impossible to prove they're not hallucinations?
I think calling the incorrect output of an LLM a “hallucination” is too kind on the companies creating these models even if it’s technically accurate. “Being lied to” would be more accurate as a description for how the end user feels.
Lying is deliberately deceiving, but yeah, to a reader, who in a effect is a trusting customer who pays with part of their attention diverted to advertising support, broadcasting a hallucination is essentially the same thing.
Vibe Posting without reading the article is as lazy as Vibe Coding without reading the code.
You don’t need a metaphysics seminar to evaluate this. The person being quoted showed up and said the quotes attributed to him are fake and not in the linked source:
https://infosec.exchange/@mttaggart/116065340523529645
>Scott Shambaugh here. None of the quotes you attribute to me in the second half of the article are accurate, and do not exist at the source you link. It appears that they themselves are AI hallucinations. The irony here is fantastic.
So stop retreating into “maybe it was something else” while refusing to read what you’re commenting on. Whether the fabrication came from an LLM or a human is not your get-out-of-reading-free card -- the failure is that fabricated quotes were published and attributed to a real person.
Please don’t comment again until you’ve read the original post and checked the archived Ars piece against the source it claims to quote. If you’re not willing to do that bare minimum, then you’re not being skeptical -- you’re just being lazy on purpose.
By what proceess do you imagine I arrived at the conclusion that the article suggested that published quotes were LLM hallucinations when that was not mentioned in the article title?
You accuse me of performative skepticism, yet all I think is that it is better to have evidence over assumptions, and it is better to ask if that evidence exists.
It seems a much better approach than making false accusations based upon your own vibes, I don't think Scott Shambaugh went to that level though.
The right thing to do would be a mea-culpa style post and explain what went wrong, but I suspect the article will simply remain taken down and Ars will pretend this never happened.
I loved Ars in the early years, but I'd argue since the Conde Nast acquisition in 2008 the site has been a shadow of its former self for a long time, trading on a formerly trusted brand name that recent iterations simply don't live up to anymore.
I'm basically getting tech news from social media sites now and I don't like that.
I think there are enough of us who are hungry for this, both as creators and consumers. To make goods and services that are truly what people want.
Maybe the AI revolution will spark a backlash that will lead to a new economy with new values. Sustainable business which don't need to squeeze their customers for every last penny of revenue. Which are happy to reinvest their profits into their products and employees.
Maybe.
Need to set an email address and browser up only for sites that require registration.
You may be fine with damning one or the other before all the facts are known, zahlman, but not all of us are.
It's a slop job now.
Ars Technica, a supposedly reputable institution, has no editorial review. No checks. Just a lazy slop cannon journalist prompting an LLM to research and write articles for her.
Ask yourself if you think it's much different at other publications.
The ones that remain are probably at some extreme on one or more attributes (e.g. overworked, underpaid) and are leaning on genAI out of desperation.
We’ll know more in only a couple days — how about we wait that long before administering punishment?
EDIT: And there's no plausible deniability for this like there is for typos, or maligned sources. Nobody typed these quotes out and went "oops, that's not what Scott said". Benj Edwards or Kyle Orland pulled the lever on the bullshit slot machine and attacked someone's integrity with the result.
"In the past, though, the threat of anonymous drive-by character assassination at least required a human to be behind the attack. Now, the potential exists for AI-generated invective to infect your online footprint."
Now to be clear, that’s a hypothetical and who knows what the actual story is — but whatever it is, it will emerge in mere days. I can wait that long before throwing away two lives, even if you can’t.
> Bubbling that liability up to Arse Technica is valuable for punishing them
Evaluating whether Ars Technica establishes credible accountability mechanisms, such as hiring an Ombud, is at least as important as punishing individuals.
Did Ars respond in any way after the conviction of their ex-writer? Better vetting of their hires might have been a response. Apparently there was a record of some questionable opinions held by the ex-writer. I don't know, personally, if any of their policies changed.
The current suspected bad behavior involved the possibility that the journalists were lacking integrity in their jobs. So if this possibility is confirmed I expect to see publicly announced structural changes in the editorial process at Ars Technica if I am to continue to be a subscriber and reader.
1 https://arstechnica.com/civis/threads/ex-ars-writer-sentence...
Edit: Fixed italics issue
> If you are the person who deployed this agent, please reach out. It’s important for us to understand this failure mode, and to that end we need to know what model this was running on and what was in the soul document. I’m not upset and you can contact me anonymously if you’d like.
I can see where he's coming from, and I suppose he's being the bigger man in the situation, but at some point one of these reckless moltbrain kiddies is going to have to pay. Libel and extortion should carry penalties no matter whether you do it directly, or via code that you wrote, or via code that you deployed without reading it.
The AI's hit piece on Scott was pretty minor, so if we want to wait around for a more serious injury that's fine, just as long as we're standing ready to prosecute when (not 'if') it happens.