undefined

points

[-]

I am running similar experiment but so far, changing the seed of openai seems to give similar results. Which if that confirms, is concerning to me on how sensitive it could be

by dangoodmanUT104 days ago|

prev|

[-]

I found the opposite. GPT-5 is better at judging along a true gradient of scores, while Gemini loves to pick 100%, 20%, 10%, 5%, or 0%. Like you never get a 87% score.

by lukasb104 days ago|

prev|

[-]

Interesting, I'll give voting a shot, thanks.