https://files.catbox.moe/r3oru2.png
- My Qwen 3.6 result had sun and cloud in sky, similar to the second Opus 4.7 result in Simon's post.
- My Qwen 3.6 result had no grass (except as a green line), but all three results in Simon's post had grass (thick).
- My Qwen 3.6 result had visible "tailing air motion" like Simon's Qwen 3.6 result.
- My Qwen 3.6 result had a "sun with halo" effect that none of Simon's results had.
But, I know, it's more about the pelican and the bicycle.
I can't comment that flamingo.
Is the 20.9GB GGUF version better or negligible in comparison?
I just tried this GGUF with llama.cpp in its UD Q4_K_XL version on my custom agentic oritened task consisiting of wiki exploration and automatic database building ( https://github.com/GistNoesis/Shoggoth.db/ )
I noted a nice improvement over QWen3.5 in its ability to discover new creatures in the open ended searching task, but I've not quantified it yet with numbers. It also seems faster, at around 140 token/s compared to 100 token/s , but that's maybe due to some different configuration options.
Some little difference with QWen3.5 : to avoid crashes due to lack of memory in multimodal I had to pass --no-mmproj-offload to disable the gpu offload to convert the images to tokens otherwise it would crash for high resolutions images. I also used quantized kv store by passing -ctk q8_0 -ctv q8_0 and with a ctx-size 150000 it only need 23099 MiB of device memory which means no partial RAM offloading when I use a RTX 4090.
https://simonwillison.net/2025/Nov/13/training-for-pelicans-...
This reminds me of Pictionary. [0] Some people are good and some are really bad.
I am really bad a remembering how items look in my head and fail at drawing in Pictionary. My drawing skills are tied to being able to copy what I see.
I thought that's exactly what they are?
* It's sitting on the tire, not the seat.
* Is that weird white and black thing supposed to be a beak? If so, it's sticking out of the side of its face rather than the center.
* The wheel spokes are bizarre.
* One of the flamingo's legs doesn't extend to the pedal.
* If you look closely at the sunglasses, they're semi-transparent, and the flamingo only has one eye! Or the other eye is just on a different part of its face, which means the sunglasses aren't positioned correctly. Or the other eye isn't.
* (subjective) The sunglasses and bowtie are cute, but you didn't ask for them, so I'd actually dock points for that.
* (subjective) I guess flamingos have multiple tail feathers, but it looks kinda odd as drawn.
In contrast, Opus's flamingo isn't as detailed or fancy, but more or less all of it looks correct.
Simon, any ideas?
https://ibb.co/FLc6kggm (tried here temperature 0.7 instead of pure defaults)
Thinking mode for general tasks: temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
Thinking mode for precise coding tasks (e.g. WebDev): temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
Instruct (or non-thinking) mode for general tasks: temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
Instruct (or non-thinking) mode for reasoning tasks: temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
(Please note that the support for sampling parameters varies according to inference frameworks.)Tthe right one looks much better, plus adding sunglasses without prompting is not that great. Hopefully it won't add some backdoor to the generated code without asking. ;)
GLM-5.1 added a sparkling earring to a north Virginia opossum the other day and I was delighted: https://simonwillison.net/2026/Apr/7/glm-51/
If we want to get nitty gritty about the details of a joke, a flamingo probably couldn't physically sit on a unicycle's seat and also reach the pedals anyways.
Stylized gradients on the flamingo
Flowers
Ground/grass has a stylized look and feel
...despite a miss along the Y-axis where it's below the seat, couple oddly organized tail feathers, spokes, the composition overall is much closer to a production quality entity
Opus 4.7 looks like 20 seconds in MS paint.
Qwen3.6 looks incomplete due to the sitting position, but like a WIP I could see on a designer coworkers screen if I walk up and interrupt them. Click and drag it up, adjust tail feathers and spokes, you're there or much closer, to a usable output
I'm impressed about the reach of your blog, and I'm hoping to get into blogging similar things. I currently have a lot on my backlog to blog about.
In short, keep up the good work with an interesting blog!