undefined

points

[-]

Because both Nano Banana Pro and ChatGPT Images 2.0 have touted strong reasoning capabilities, and this particular prompt has more objective, easy-to-validate criteria as opposed to the subjective nature of images.

I have more subjective prompts to test reasoning but they're your-mileage-may-vary (however, gpt-2-image has surprisingly been doing much better on more objective criteria in my test cases)

by o1044936610 hours ago|

prev|

[-]

[flagged]

by minimaxir10 hours ago|

parent|

[-]

"Quirky and obscure" has the functional benefit of ensuring the source question is not in the training data/outside the median user prompt, and therefore making the model less likely to cheat.

We have enough people complaining about Simon Willison's pelican test.

by o104493665 hours ago|

parent|

[-]

When you program, do you consider using your prior knowledge of programming cheating?

by Bjartr9 hours ago|

parent|

prev|

[-]

What would make the prompt a better actual evaluation in your judgement?

by leptons6 hours ago|

parent|

[-]

Not focusing on pokemon for a start. Maybe use something more people can recognize and evaluate. I have zero knowledge of pokemon, I see it as a niche thing for ultra-nerdy people, and not something everyone is familiar with. Nothing about that test can be evaluated by anyone but a pokemon expert. Sorry, but pokemon isn't as mainstream as some people might think it is.

by tailscaler20269 hours ago|

parent|

prev|

[-]

still #opentowork huh

by beepbooptheory8 hours ago|

parent|

[-]

Where does one even use that hashtag?

by minimaxir7 hours ago|

parent|

[-]

It's a LinkedIn joke.

by codemog10 hours ago|

parent|

prev|

[-]

Ah yes, also known as C++ enjoyers.