the field is advancing so fast it's hard to do real science as their will be a new SOTA by the time you're ready to publish results. i think this is a combination of that and people having a laugh.
Would you mind sharing which benchmarks you think are useful measures for multimodal reasoning?