upvote
I just got a much better version using this command instead, which uses the maximum image size according to https://github.com/openai/openai-cookbook/blob/main/examples...

  OPENAI_API_KEY="$(llm keys get openai)" \
    uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
    -m gpt-image-2 \
    "Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
    --quality high --size 3840x2160
https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a... - I found the raccoon!

I think that image cost 40 cents.

reply
Fed into a clear Claude Code max effort session with : "Inspect waldo2.png, and give me the pixel location of a raccoon holding a ham radio.". It sliced the image into small sections and gave:

"Found the raccoon holding a ham radio in waldo2.png (3840×2160).

  - Raccoon center: roughly (460, 1680)                                                                                            
  - Ham radio (walkie-talkie) center: roughly (505, 1650) — antenna tip around (510, 1585)                                         
  - Bounding box (raccoon + radio): approx x: 370–540, y: 1550–1780                                                                
                                                                                                                                   
  It's in the lower-left area of the image, just right of the red-and-white striped souvenir umbrella, wearing a green vest. "
Which is correct!
reply
I had one problem: finding the raccoon. Now I have two: finding the red-and-white striped souvenir umbrella, and finding the raccoon.
reply
simonw posted 2 different images: make sure to look at the second one.
reply
Yeah, I noticed that just now, but too late to delete the comment :p
reply
You had a meta problem, and three, in total: find the raccoon, find the umbrella, find the right link in the comments.
reply
To find Waldo you must first create the Universe.
reply
We would need a larger sample size than just myself, but the raccoon was in the very first spot I looked. Found it literally immediately, as if that's where my eyes naturally gravitated to first. Hopefully that's just luck and not an indictment of the image-creating ability, as if there is some element missing from this "Where's Waldo" image, that would normally make Waldo hard to find.
reply
There seemed to be more space around the raccoon than most other subjects. Zoomed out it appears as almost a “halo” highlighting the raccoon.
reply
Funny how it can look convincing from far away but once you zoom in you find out most characters have a mix of leprosy and skin cancer.
reply
A startling number of people either have no arms, one arm, a half of an arm, or a shrunken arm; how odd!
reply
To be fair, the average person has fewer than two arms.
reply
Most people have an ARM in their pockets, nowadays. And possibly on their wrist.
reply
Haha. Underrated comment!
reply
There id a leg that sprouts into part of bush, perhaps that's where people's legs are disappearing to.
reply
This is why they're congregating around the first aid and the lost and found
reply
Finding the raccoon was instant. Finding all the weird AI artifacts is more fun. It's quite fascinating really. As usual it looks impressive at a glance but completely falls apart on closer inspection. I also didn't find any jokes, unless maybe the bridge to nowhere or finger posts pointing both ways counts?
reply
The faces...that's nice that it turned a kid's book into an abomination
reply
By image generation standards this is a ridiculously good result. No surprise that people instantly find the new limits, but they are new limits.
reply
But it's also straight up plagiarism and still ridiculously bad on so many levels.
reply
It could already copy the art styles from its training data, what is the advancement here?
reply
It's interesting that the raccoon is well defined because it was a part of the request. But none of the other Fauna are.
reply
it's interesting, zoomed out it kind of looks ok, zoomed in.... oh my.
reply
The real NFTs where the images we generated along the way
reply
The people in this image remind me of early this person does not exist, in the best way
reply
fair point, also "this raccoon does not exist"
reply
I tried it on the ChatGPT web UI and it also worked, although the ham radio looks like a handbag to me.

https://postimg.cc/wyxgCgNY

reply
Nice, enjoyed the image as someone who has been to the events. But also easy raccoon placement :)
reply
mmmm yummy OSLS?
reply
Can it generate non halloween version though?

This lower-is-better danse macabre, nightmares inducing ratio feels like interesting proxy for models capability.

reply
deleted
reply
I found it on the 2nd image! On the 1st one not yet...
reply
Cost me < 1 cents - https://elsrc.com/elsrc/waldo/wojak.jpg

And this medium quality, high resolution https://elsrc.com/elsrc/waldo/10_wojaks.jpg was 13cents

p.s. aaaand that's soft launch my SaaS above, you can replace wojak.jpg with anything you want and it will paint that. It's basically appending to prompt defined by elsrc's dashboard. Hopefully a more sane way to manage genai content. Be gentle to my server, hn!

reply
>I think that image cost 40 cents.

Kinda made me sad assuming the author didn't license anything to OpenAI.

I recognize it could revert (99% of?) progress if all the labs moved to consent-based training sets exclusively, but I can't think of any other fair way.

$.40 does not represent the appropriate value to me considering the desirability of the IP and its earning potential in print and elsewhere. If the world has to wait until it’s fair, what of value will be lost? (I suppose this is where the big wrinkle of foreign open weight models comes in.)

reply
License what? The concept of a hidden object search? The only stylistic similarity here is the viewing angle. Where’s Waldo comics are flat, brightly colored line drawings that look nothing like this at all.
reply
Well, I recognized the style from even the new physical books on sale today, but I don’t know art well enough to use a term like flat.

I am not an art expert but I’m perhaps a reasonable consumer and there is possibility of confusion if someone sells AI Where’s Waldo knockoff books at the dollar store, maybe until I take a closer look.

reply
deleted
reply
> though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure

I see an opportunity for a new AI test!

reply
There have already been several attempts to procedurally generate Where’s Waldo? style images since the early Stable Diffusion days, including experiments that used a YOLO filter on each face and then processed them with ADetailer.

It's a difficult test for genai to pass. As I mentioned in a different thread, it requires a holistic understanding (in that there can only be one Waldo Highlander style), while also holding up to scrutiny when you examine any individual, ordinary figure.

reply
I've actually been feeding them into Claude Opus 4.7 with its new high resolution image inputs, with mixed results - in one case there was no raccoon but it was SURE there was and told me it was definitely there but it couldn't find it.
reply
deleted
reply
Really hard to look at these images given how not human like the humans are. A few are ok, but a lot are disfigured or missing parts and its hard to find a raccoon in here.
reply
Thanks for the image, I will see their faces in my nightmares.
reply
This happens all too frequently when you ask a GenAI model to create an image with a large crowd especially a “Where’s Waldo?” style scenes, where by definition you’re going to be examining individual faces very closely.
reply
What about the faces of the people ChatGPT killed?
reply
Like... this has things that AI will seemingly always be terrible at?

At some point the level of detail is utter garbo and always will be. An artist who was thoughtful could have some mistakes but someone who put that much time into a drawing wouldn't have:

- Nightmarish screaming faces on most people

- A sign that points seemingly both directions, or the incorrect one for a lake and a first AID tent that doesn't exist

- A dog in bottom left and near lake which looks like some sort of fuzzy monstrosity...

It looks SO impressive before you try to take in any detail. The hand selected images for the preview have the same shit. The view of musculature has a sternocleidomastoid with no clavicle attachment. The periodic table seems good until you take a look at the metals...

We're reconfiguring all of our RAM & GPUs and wasting so much water and electricity for crappier where's Waldos??

reply
AI will seemingly always be ...

You do realize that the whole image generation field is barely 10 years old?

I remember how I was able to generate mnist digits for the first time about 10 years ago - that seemed almost like magic!

reply
The second 4K image definitely has a raccoon on the left there! Nice.
reply
That is a devilishly difficult prompt for current diffusion tasks. Kudos.
reply
haha took me a while to notice that one of the buildings is labelled 'Ham radio'
reply
I see the raccoon
reply
Damn. There’s a fun game app to make here ^^
reply
Is there? The moment you look closely at the puzzle (which is... the whole point of Where's Waldo), you notice all the deformities and errors.
reply
Yes, it’s not there yet. But nothing unsolvable. First thing that comes to mind would be generating smaller portion at the same resolution, then expand through tiling (although one might need to use another service & model for this), like we used to do with Stable Diffusion years ago.

Another option would be generating these large images, splitting them into grids, and using inpainting on each "tile" to improve the details. Basically the reverse of the first one.

Both significantly increase costs, but for the second one having what Images 2.0 can produce as an input could help significantly improve the overall coherence.

reply
Yes sounds more like a fun research project instead.
reply
deleted
reply
deleted
reply
5.4 thinking says "Just right of center, immediately to the right of the HAM RADIO shack. Look on the dirt path there: the raccoon is the small gray figure partly hidden behind the woman in the red-and-yellow shirt, a little above the man in the green hat. Roughly 57% from the left, 48% from the top."

(I don't think it's right).

reply
I tried

> please add a giant red arrow to a red circle around the raccoon holding a ham radio or add a cross through the entire image if one does not exist

and got this. I'm not sure I know what a ham radio looks like though.

https://i.ritzastatic.com/static/ffef1a8e639bc85b71b692c3ba1...

reply
Also, the racoon it circled isn't in the original.
reply
I love how perfectly this captures the difficulties of using generative AI for detection tasks.
reply
Oh god yes, I've been trying to make a LLM Assisted Magic the Gathering card scanner... its been a hell of a time trying to get it to just OCR card names well....
reply
Why would you use an LLM for OCR?
reply
Because apparently that's what programming is and can only be these days...
reply
Indeed. I suppose one way to ensure you can find Waldo in any image is to add it yourself.
reply
hilarious - i tried and got the same thing.

there was a very large bear in the first image; when asked to circle the raccoon it just turned the bear into a giant raccoon and circled it.

reply