upvote
Look at the tasks in the benchmark (see §2 https://arxiv.org/html/2503.14499v3)
reply