undefined

points

[-]

Sure there is. If you want to know if students understand the material, you don't hand out the answers to the test ahead of time.

Collecting a bunch of "Hard questions for LLMs" in one place will invariably result in Goodhart's law (When a measure becomes a target, it ceases to be a good measure). You'll have no idea if the next round of LLMs is better because they're generally smarter, or because they were trained specifically on these questions.

by daedrdev289 days ago|

prev|

[-]

generic biases can also be fixed

by orbital-decay289 days ago|

parent|

[-]

*Some generic biases. Some others like recency bias, serial-position effect, "pink elephant" effect, negation accuracy seem to be pretty fundamental and are unlikely to be fixed without architectural changes, or at all. Things exploiting in-context learning and native context formatting are also hard to suppress during the training without making the model worse.