https://simonwillison.net/2025/Nov/13/training-for-pelicans-...
"Give me an illustration of a bicycle riding by a pelican"
"Give me an illustration of a bicycle riding over a pelican"
"Give me an illustration of a bicycle riding under a flying pelican"
So on and so forth. Or will it start to look like the Studio C sketch about Lobster Bisque: https://youtu.be/A2KCGQhVRTE
I wouldn't really even call it "cheating" since it has improved models' ability to generate artistic SVG imagery more broadly but the days of this being an effective way to evaluate a model's "interdisciplinary" visual reasoning abilities have long since passed, IMO.
It's become yet another example in the ever growing list of benchmaxxed targets whose original purpose was defeated by teaching to the test.
https://x.com/jeffdean/status/2024525132266688757?s=46&t=ZjF...