73% judged GPT 4.5 (edit: had incorrectly said 4o before)to be the human.
https://arxiv.org/abs/2503.23674
Not only are people bad at judging this, but are directionally wrong.
> Our experiments show that annotators who frequently use LLMs for writing tasks excel at detecting AI-generated text, even without any specialized training or feedback. In fact, the majority vote among five such “expert” annotators misclassifies only 1 of 300 articles, significantly outperforming most commercial and open-source detectors we evaluated even in the presence of evasion tactics like paraphrasing and humanization.