undefined

points

[-]

I see AI pass the turning test all the time, since humans are constantly falsely being accused of being an AI.

It doesn't mean that AI got good, just that humans are thinking other humans are AI, which is a form of passing the test.

The adversarial version with humans involved is actually easier to pass because of this - because real actual humans wouldn't pass your non adversarial version.

by zahlman2 hours ago|

parent|

[-]

I've seen a fair number of cases where someone swears up and down not to be using AI to generate responses, but there's no good reason to believe it (except perhaps specifically for the messages where that claim is made).

This includes times that someone basically disappeared from e.g. Stack Overflow at some point before the release of ChatGPT, having written a bunch of posts that barely demonstrate functional literacy or comprehension of English; and then came back afterward posting long messages with impeccable grammar and spelling in textbook "LLM house style".

by jmalicki2 hours ago|

parent|

[-]

There are a ton of people like that, but the LLM house style also exists because a ton of people write that way too.

The people falsely accused because they've used em-dashes for 20 years aren't the ones that were functionally illiterate before.

by SR2Z23 minutes ago|

parent|

[-]

I don't think there's any definitive way to check, but for me one of the biggest tells that a long piece of writing was LLM generated is that it will hardly say anything given how many words are in it.

(well that and the "it's not just x, it's y!" pattern they seem to love)

by badc0ffee2 hours ago|

parent|

prev|

[-]

I think em-dashes were uncommon mainly because they're not always convenient to type.

by wat100004 hours ago|

parent|

prev|

[-]

In one study, GPT-4.5 was judged to be human 73% of the time, which means that the actual human was judged to be human only 27% of the time. More human than human, as Tyrell would say.

Edit: folks, the standard Turing test involves a computer and a human, and then a judge communicating with both and giving a verdict about which one is the human. The percentages for the two entities being judged will add up to exactly 100%. That's how this test was conducted. Please don't assume I'm a moron.

by dwpdwpdwpdwpdwp3 hours ago|

parent|

[-]

The implication would be that GPT-4.5 was not judged to be human 27% of the time. You can't determine how often humans were judged correctly as humans from that data point.

by jmalicki3 hours ago|

parent|

[-]

The structure of the test was that there was one human and one AI conversation partner, and the rater had to choose which one was which.

Given that structure, you can judge from that data point.

by jmalicki2 hours ago|

parent|

prev|

[-]

That was also before the crazy AI hysteria we have today with the em-dash police everywhere.

by wat1000019 minutes ago|

parent|

[-]

For the test to be free of bias, we’ll have to ensure all the humans are from Nigeria.

by Melatonic3 hours ago|

parent|

prev|

[-]

Those stats dont necessarily line up that way. Do you have a link?

by jmalicki3 hours ago|

parent|

[-]

Given the way the test was structured it does line up.

https://arxiv.org/abs/2503.23674

by Melatonic2 hours ago|

parent|

[-]

Surprisingly good. I wonder how they would have done without the 5 minute limit on conversations (average of 8 messages per convo per the study)

by why_at3 hours ago|

prev|

[-]

Whenever the Turing Test comes up people always insist that it's been passed because at some point they tried it and fooled at least 50% of the people. But yeah this isn't a very interesting version of it, ELIZA was able to make some people believe it was human in the 1960's but being able to fool some of the people some of the time isn't very hard.

>The more interesting Turing-style test would be one that gets repeated many times with many interviewers in the original adversarial setting, where both the human subject & AI subject are attempting to convince the interviewer that they're human.

In addition, I think it's reasonable to select people with at least some familiarity of the strengths and weaknesses of the AI instead of random credulous people who aren't very good at asking the right questions.

There is still the $20,000 bet between Kurzweil and Kapor which still hasn't been resolved. https://longbets.org/1/

by josu2 hours ago|

parent|

[-]

[dead]

by hodgesrm3 hours ago|

prev|

[-]

Does anyone else find it a bit disorienting that we're essentially implementing the Blade Runner Voight Kampff test?

https://bladerunner.fandom.com/wiki/Voight-Kampff_test

by goldenarm4 hours ago|

prev|

[-]

The turing test was passed 2014, before LLMs, and I've never seen a researcher take it seriously.

by Izkata3 hours ago|

parent|

[-]

That version of the test yeah, there's two versions of chatbots I remember reading about: people who would test their chatbots on dating sites and have to awkwardly inform people they were falling in love with a bot, and one competition where someone got the great idea to make his bot swear and insult the interviewer like a CoD player. I think the second one was from even earlier than that.

by throwaway6137464 hours ago|

prev|

[-]

[dead]