This isn't exactly a new problem we do it with any bit of new software/hardware, not just LLMs. We check its work when it's new, and then tend to trust it over time as it proves itself.
But it seems to be hitting us worse with LLMs, as they are less consistent than previous software. And LLM hallucinations are partially dangerous, because they are often plausible enough to pass the sniff test. We just aren't used to handling something this unpredictable.
What the OP pointed out is a fact of life.
We do many things to ensure that humans don’t get “routine fatigue”- like pointing at each item before a train leaves the station to ensure you don’t eyes glaze over during your safety check list.
This isn’t an excuse for the behavior. Its more about what the problem is and what a corresponding fix should address.
I think it slips because the consequences of sloppy journalism aren’t immediately felt. But as we’re witnessing in the U.S., a long decay of journalistic integrity contributes to tremendous harm.
It used to be that to be a “journalist” was a sacred responsibility. A member of the Fourth Estate, who must endeavour to maintain the confidence of the people.
Its like seeing a dog play basketball badly. You're too stunned to be like "no don't sign him to <home team>".
But at the same time, doing that makes it even more likely the human in the loop will get sloppy, because there'll be even fewer cases where their input is actually needed.
I'm wondering if you need to start inserting intentional canaries to validate if humans are actually doing sufficiently torough reviews.