Hacker News
new
past
comments
ask
show
jobs
points
by
simonw
19 hours ago
|
comments
by
irthomasthomas
19 hours ago
|
next
[-]
Try llm-consortium with --judging-method rank
reply
by
andriy_koval
19 hours ago
|
prev
|
[-]
I think it will make results way better and more representative of model abilities..
reply
by
simonw
19 hours ago
|
parent
|
[-]
It would... but the test is inherently silly, so I'm still not sure if it's worth me investing that extra effort in it.
reply