upvote
Humans aren't very diligent in the long term. If an LLM does something correctly enough times in a row (or close enough), humans are likely to stop checking its work throughly enough.

This isn't exactly a new problem we do it with any bit of new software/hardware, not just LLMs. We check its work when it's new, and then tend to trust it over time as it proves itself.

But it seems to be hitting us worse with LLMs, as they are less consistent than previous software. And LLM hallucinations are partially dangerous, because they are often plausible enough to pass the sniff test. We just aren't used to handling something this unpredictable.

reply
It’s a core part of the job and there’s simply no excuse for complacency.
reply
There's not a human alive that isnt complacent in many ways.
reply
You're being way too easy on a journalist.
reply
And too easy on the editor who was supposed to personally verify that the article was properly sourced prior to publication. This is like basic stuff that you learn working on a high school newspaper.
reply
lol true
reply
The words on the page are just a medium to sell ads. If shit gets ad views then producing shit is part of the job... unless you're the one stepping up to cut the checks.
reply
Ars also sells ad-free subscriptions.
reply
This is a first degree expectation of most businesses.

What the OP pointed out is a fact of life.

We do many things to ensure that humans don’t get “routine fatigue”- like pointing at each item before a train leaves the station to ensure you don’t eyes glaze over during your safety check list.

This isn’t an excuse for the behavior. Its more about what the problem is and what a corresponding fix should address.

reply
I agree. The role of an editor is in part to do this train pointing.

I think it slips because the consequences of sloppy journalism aren’t immediately felt. But as we’re witnessing in the U.S., a long decay of journalistic integrity contributes to tremendous harm.

It used to be that to be a “journalist” was a sacred responsibility. A member of the Fourth Estate, who must endeavour to maintain the confidence of the people.

reply
There's a weird inconsistency among the more pro-AI people that they expect this output to pass as human, but then don't give it the review that an outsourced human would get.
reply
> but then don't give it the review that an outsourced human would get.

Its like seeing a dog play basketball badly. You're too stunned to be like "no don't sign him to <home team>".

reply
Surely the rules would stop such a thing from happening!
reply
The irony is that while from perfect, an LLM-based fact-checking agent is likely to be far more dilligent (but still needs human review as well) by nature of being trivial to ensure it has no memory of having done a long list of them (if you pass e.g. Claude a long list directly in the same context, it is prone to deciding the task is "tedious" and starting to take shortcuts).

But at the same time, doing that makes it even more likely the human in the loop will get sloppy, because there'll be even fewer cases where their input is actually needed.

I'm wondering if you need to start inserting intentional canaries to validate if humans are actually doing sufficiently torough reviews.

reply
The kind of people to use LLM to write news article for them tend not to be the people who care about mundane things like reading sources or ensuring what they write has any resemblance to the truth.
reply
The problem is that the LLM's sources can be LLM generated. I was looking up some health question and tried clicking to see the source for one of the LLMs claim. The source was a blog post that contained an obvious hallucination or false elaboration.
reply
The source would just be the article, which the Ars author used an LLM to avoid reading in the first place.
reply