Hacker News
new
past
comments
ask
show
jobs
points
by
wat10000
4 hours ago
|
comments
by
dwpdwpdwpdwpdwp
3 hours ago
|
next
[-]
The implication would be that GPT-4.5 was not judged to be human 27% of the time. You can't determine how often humans were judged correctly as humans from that data point.
reply
by
jmalicki
3 hours ago
|
parent
|
[-]
The structure of the test was that there was one human and one AI conversation partner, and the rater had to choose which one was which.
Given that structure, you can judge from that data point.
reply
by
jmalicki
2 hours ago
|
prev
|
next
[-]
That was also before the crazy AI hysteria we have today with the em-dash police everywhere.
reply
by
wat10000
17 minutes ago
|
parent
|
[-]
For the test to be free of bias, we’ll have to ensure all the humans are from Nigeria.
reply
by
Melatonic
3 hours ago
|
prev
|
[-]
Those stats dont necessarily line up that way. Do you have a link?
reply
by
jmalicki
3 hours ago
|
parent
|
[-]
Given the way the test was structured it does line up.
https://arxiv.org/abs/2503.23674
reply
by
Melatonic
2 hours ago
|
parent
|
[-]
Surprisingly good. I wonder how they would have done without the 5 minute limit on conversations (average of 8 messages per convo per the study)
reply