upvote
No way. The Pro pelican is fatter, has a customized front fork, and the sun is shining! He’s definitely living the best life.
reply
The pro pelican is a work of art! It goes dimensions that no other LLM has gone before.
reply
yeah. look at these 4 feathers (?) on his bum too.
reply
a lot of dumplings
reply
This is just a random thought, but have you tried doing an 'agentic' pelican?

As in have the model consider its generated SVG, and gradually refine it, using its knowledge of the relative positions and proportions of the shapes generated, and have it spin for a while, and hopefully the end result will be better than just oneshotting it.

Or maybe going even one step further - most modern models have tool use and image recognition capabilities - what if you have it generate an SVG (or parts/layers of it, as per the model's discretion) and feed it back to itself via image recognition, and then improve on the result.

I think it'd be interesting to see, as for a lot of models, their oneshot capability in coding is not necessarily corellated with their in-harness ability, the latter which really matters.

reply
I tried that for the GPT-5 launch - a self-improving loop that renders the SVG, looks at it and tries again - and the results were surprisingly disappointing.

I should try it again with the more recent models.

reply
I see, thanks. I guess most current models are not yet trained for this loop.

Could you please try with Opus 4.7? I think there's a chance of it doing well, considering the design/vision focus.

reply
The Flash one is pretty impressive. Might be my favorite so far in the pelican-riding-a-bicycle series
reply
DeepSeek pelicans are the angriest pelicans I’ve seen so far.
reply
they're just late for work.
reply
They're stressed pelicans from Hangzhou.
reply
996 Pelican, lol
reply
Being a bicycle geometry nerd I always look at the bicycle first.

Let me tell you how much the Pro one sucks... It looks like failed Pedersen[1]. The rear wheel intersects with the bottom bracket, so it wouldn't even roll. Or rather, this bike couldn't exist.

The flash one looks surprisingly correct with some wild fork offset and the slackest of seat tubes. It's got some lowrider[2] aspirations with the small wheels, but with longer, Rivendellish[3], chainstays. The seat post has different angle than the seat tube, so good luck lowering that.

[1] https://en.wikipedia.org/wiki/Pedersen_bicycle

[2] https://en.wikipedia.org/wiki/Lowrider_bicycle

[3] https://www.rivbike.com/

reply
This is an excellent comment. Thanks for this - I've only ever thought about whether the frame is the right shape, I never thought about how different illustrations might map to different bicycle categories.
reply
Some other reactions:

I wonder which model will try some more common spoke lacing patterns. Right now there seems to be a preference for radial lacing, which is not super common (but simple to draw). The Flash and Pro one uses 16 spoke rims, which actually exist[1] but are not super common.

The Pro model fails badly at the spokes. Heck, the spokes sit on the outside of the drive side of the rim and tire. Have a nice ride riding on the spokes (instead of the tire) welded to the side of your rim.

Both bikes have the drive side on the left, which is very very uncommon. That can't exist in the training data.

[1] https://cicli-berlinetta.com/product/campagnolo-shamal-16-sp...

reply
The Pedersen looks like someone failed the "draw a bicycle" test and decided to adjust the universe.
reply
I think the pelican on a bike is known widely enough that of seizes to be useful as a benchmark. There is even a pelican briefly appearing in the promo video of GPT-5, if I'm not mistaken https://openai.com/gpt-5/. So the companies are apparently aware of it.
reply
It was a bigger deal in the Gemini 3.1 launch: https://x.com/JeffDean/status/2024525132266688757
reply
What was your prompt for the image? Apologies if this should be obvious.
reply
>Generate an SVG of a pelican riding a bicycle

at the top of the linked pages.

reply
To me this is the perfect proof that

1) LLM is not AGI. Because surely if AGI it would imply that pro would do better than flash?

2) and because of the above, Pelican example is most likely already being benchmaxxed.

reply
Is it then Deepseek hosted by Deepseek?

How much does the drawing change if you ask it again?

reply
I really like the pro version. The pelican is so cute.
reply
Where is the GPT 5.5 Pelican?
reply
Why they so angry?
reply
[flagged]
reply
It's just Simon Willison (the person you are replying to) who always makes a pelican, as his personal flippant benchmark. It's not that deep.
reply
No benchmark will be perfect, especially if it's public but it's a fun experiment to visually see how these models get better and better.
reply
Why is it so wrong?
reply
Thanks for the "scientific air" remark, that gave me a genuine LOL.
reply
"The difference between screwing around and science is writing it down" -- Adam Savage
reply
This should not be the top comment on every model release post. It's getting tiring.
reply
This should be the bottom comment on the pelican comment on every model release post.
reply
Clearly the top comment should be "Imagine a beowulf cluster of Deepseek v4!"
reply
My mother was murdered by Beowulf, you insensitive Claude!
reply
This was perfect.
reply