It doesn't mean that AI got good, just that humans are thinking other humans are AI, which is a form of passing the test.
The adversarial version with humans involved is actually easier to pass because of this - because real actual humans wouldn't pass your non adversarial version.
This includes times that someone basically disappeared from e.g. Stack Overflow at some point before the release of ChatGPT, having written a bunch of posts that barely demonstrate functional literacy or comprehension of English; and then came back afterward posting long messages with impeccable grammar and spelling in textbook "LLM house style".
The people falsely accused because they've used em-dashes for 20 years aren't the ones that were functionally illiterate before.
(well that and the "it's not just x, it's y!" pattern they seem to love)
Edit: folks, the standard Turing test involves a computer and a human, and then a judge communicating with both and giving a verdict about which one is the human. The percentages for the two entities being judged will add up to exactly 100%. That's how this test was conducted. Please don't assume I'm a moron.
Given that structure, you can judge from that data point.
>The more interesting Turing-style test would be one that gets repeated many times with many interviewers in the original adversarial setting, where both the human subject & AI subject are attempting to convince the interviewer that they're human.
In addition, I think it's reasonable to select people with at least some familiarity of the strengths and weaknesses of the AI instead of random credulous people who aren't very good at asking the right questions.
There is still the $20,000 bet between Kurzweil and Kapor which still hasn't been resolved. https://longbets.org/1/