Hacker News
new
past
comments
ask
show
jobs
points
by
smlacy
2 hours ago
|
comments
by
titanomachy
15 minutes ago
|
next
[-]
Then your internal benchmarks will be in the post-training set and you’ll have to make new ones.
reply
by
thorum
2 hours ago
|
prev
|
[-]
I have the opposite experience: random HN/Reddit comments saying “this sucks” or “whoa this is a huge improvement” are the only benchmark that means anything. Standard benchmarks are all gamed and don’t capture the complexity of the real world.
reply