upvote
Was it ever a good test? How do you even objectively assess what a good pelican on a bike is anyway?
reply
SVG generation is a good test because it's extremely easy to subjectively assess with visual reasoning where humans are strong. However, pelican on a bike specifically may be overused at this point.
reply
The "big beak!" comment in the svg source makes me think it's definitely a gamed "benchmark" at this point.
reply
Do you think the models are ready for the next level? I believe that would be: Pelican feeding Spaghetti to Will Smith.
reply
Variations of this comment have been posted for over a year. The pelican has now morphed into part of HN culture rather than a legitimate benchmark, but it's still valuable as a meme.
reply
it is more an example of gaming (the HN system) than meme.
reply
I'd be very surprised if this is in the training data given that most models mess it up to this day. E.g. look at the ones from Opus.
reply
[flagged]
reply
I really don't understand what's interesting about this test and why is it always on top.
reply
It's funny.
reply
It really is lol
reply
As often happens with random oddball things which become traditions in web communities, the replies asking what it is or complaining about it, begin to gain their own humor value.
reply
Same reason you would always see the same top comments on reddit during a certain era.
reply
That’s what I think too, but we should actively go against such culture here because hn is not reddit.
reply
It basically is at this point, if you haven’t noticed. Complete with the same America bad, Elon bad, democrats good midwit progressive politics.
reply
Almost all Musk related negative news gets [flagged] and never hits the the front page, so there is still a silent base on the other "team" apparently.
reply
Don't forget EU bad! Because they won't let Apple screw over consumers.
reply
Elon does suck. Objectively.
reply
Is this Straw Man and Ad Hominem ?
reply
It has become a funny meme, much like "My hovercraft is full of eels!"
reply
because you can't still ask LLMs to port DOOM to hardware X or Y
reply
It's a meme, and HN loves upvoting memes. Just like Reddit!
reply
The ultimate measure of an LLM is whether it can produce a capable image of a pelican riding a bicycle. All other use cases are but a distraction!
reply
Do you seriously have a dedicated “bad takes on AI” hn account?
reply
yeah, although I do combine it with "replies to snarky questions" for efficiency
reply
True that
reply