Edit: this one has crossed legs lol
Bike frames are very hard to draw unless you've already consciously internalized the basic shape, see https://www.booooooom.com/2016/05/09/bicycles-built-based-on...
https://hcker.news/pelican-low.svg
https://hcker.news/pelican-medium.svg
https://hcker.news/pelican-high.svg
https://hcker.news/pelican-xhigh.svg
Someone needs to make a pelican arena, I have no idea if these are considered good or not.
I gave a talk about it last year: https://simonwillison.net/2025/Jun/6/six-months-in-llms/
It should not be treated as a serious benchmark.
Anyone can look and decide if it’s a good picture or not. But the numeric benchmarks don’t tell you much if you aren’t already familiar with that benchmark and how it’s constructed.
Nowadays I think it's pretty silly, because there's surely SVG drawing training data and some effort from the researchers put onto this task. It's not a showcase of emergent properties.
It's meta-interesting that few if any models actually seem to be training on it. Same with other stereotypical challenges like the car-wash question, which is still sometimes failed by high-end models.
If I ran an AI lab, I'd take it as a personal affront if my model emitted a malformed pelican or advised walking to a car wash. Heads would roll.
(There are some that generate 3d models specifically, more in the image generation family than chatbot family.)
Building software remains really hard. Most people are not going to be able to produce production quality software systems, no matter how good the AI tooling gets.
It continues to amaze me that these models that definitely know what bicycle geometry actually looks like somewhere in their weights produces such implausibly bad geometry.
Also mildly interesting, and generally consistent with my experience with LLMs, that it produced the same obvious geometry issue both times.
I feel like the main problem for the models is that they can't actually look at the visual output produced by their SVG and iterate. I'm almost willing to bet that if they could, they'd absolutely nail it at this point.
Imagine designing an SVG yourself without being able to ever look outside the XML editor!
I honestly think I could do much better on the bicycle without looking at the output (with some assistance for SVG syntax which I definitely don't know), just as someone who rides them and generally knows what the parts are.
I'd do worse at the pelicans though.
I've been contemplating a more fair version where each model gets 3-5 attempts and then can select which rendered image is "best".
Only coherent move at this point: hit the minus button immediately. There's never anything about the model in the thread other than simon's post.