undefined

points

[-]

I just got a much better version using this command instead, which uses the maximum image size according to https://github.com/openai/openai-cookbook/blob/main/examples...

  OPENAI_API_KEY="$(llm keys get openai)" \
    uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
    -m gpt-image-2 \
    "Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
    --quality high --size 3840x2160

https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a... - I found the raccoon!

I think that image cost 40 cents.

by makira17 hours ago|

parent|

[-]

Fed into a clear Claude Code max effort session with : "Inspect waldo2.png, and give me the pixel location of a raccoon holding a ham radio.". It sliced the image into small sections and gave:

"Found the raccoon holding a ham radio in waldo2.png (3840×2160).

  - Raccoon center: roughly (460, 1680)                                                                                            
  - Ham radio (walkie-talkie) center: roughly (505, 1650) — antenna tip around (510, 1585)                                         
  - Bounding box (raccoon + radio): approx x: 370–540, y: 1550–1780                                                                
                                                                                                                                   
  It's in the lower-left area of the image, just right of the red-and-white striped souvenir umbrella, wearing a green vest. "

Which is correct!

by cwillu17 hours ago|

parent|

[-]

I had one problem: finding the raccoon. Now I have two: finding the red-and-white striped souvenir umbrella, and finding the raccoon.

by makira17 hours ago|

parent|

[-]

simonw posted 2 different images: make sure to look at the second one.

by cwillu17 hours ago|

parent|

[-]

Yeah, I noticed that just now, but too late to delete the comment :p

by jaggederest15 hours ago|

parent|

[-]

You had a meta problem, and three, in total: find the raccoon, find the umbrella, find the right link in the comments.

by bombcar1 hours ago|

parent|

[-]

To find Waldo you must first create the Universe.

by M3L0NM4N14 hours ago|

parent|

prev|

[-]

We would need a larger sample size than just myself, but the raccoon was in the very first spot I looked. Found it literally immediately, as if that's where my eyes naturally gravitated to first. Hopefully that's just luck and not an indictment of the image-creating ability, as if there is some element missing from this "Where's Waldo" image, that would normally make Waldo hard to find.

by nerdsniper12 hours ago|

parent|

[-]

There seemed to be more space around the raccoon than most other subjects. Zoomed out it appears as almost a “halo” highlighting the raccoon.

by prmoustache6 hours ago|

parent|

prev|

[-]

Funny how it can look convincing from far away but once you zoom in you find out most characters have a mix of leprosy and skin cancer.

by wewtyflakes14 hours ago|

parent|

prev|

[-]

A startling number of people either have no arms, one arm, a half of an arm, or a shrunken arm; how odd!

by rattlesnakedave12 hours ago|

parent|

[-]

To be fair, the average person has fewer than two arms.

by cozzyd8 hours ago|

parent|

[-]

Most people have an ARM in their pockets, nowadays. And possibly on their wrist.

by floodfx11 hours ago|

parent|

prev|

[-]

Haha. Underrated comment!

by ehnto6 hours ago|

parent|

prev|

[-]

There id a leg that sprouts into part of bush, perhaps that's where people's legs are disappearing to.

by cozzyd11 hours ago|

parent|

prev|

[-]

This is why they're congregating around the first aid and the lost and found

by globular-toast8 hours ago|

parent|

prev|

[-]

Finding the raccoon was instant. Finding all the weird AI artifacts is more fun. It's quite fascinating really. As usual it looks impressive at a glance but completely falls apart on closer inspection. I also didn't find any jokes, unless maybe the bridge to nowhere or finger posts pointing both ways counts?

by davebren17 hours ago|

parent|

prev|

[-]

The faces...that's nice that it turned a kid's book into an abomination

by Filligree13 hours ago|

parent|

[-]

By image generation standards this is a ridiculously good result. No surprise that people instantly find the new limits, but they are new limits.

by globular-toast7 hours ago|

parent|

[-]

But it's also straight up plagiarism and still ridiculously bad on so many levels.

by davebren12 hours ago|

parent|

prev|

[-]

It could already copy the art styles from its training data, what is the advancement here?

by vaulstein9 hours ago|

parent|

prev|

[-]

It's interesting that the raccoon is well defined because it was a part of the request. But none of the other Fauna are.

by keithnz11 hours ago|

parent|

prev|

[-]

it's interesting, zoomed out it kind of looks ok, zoomed in.... oh my.

by jdironman10 hours ago|

parent|

prev|

[-]

The real NFTs where the images we generated along the way

by louiereederson17 hours ago|

parent|

prev|

[-]

The people in this image remind me of early this person does not exist, in the best way

by dfee15 hours ago|

parent|

[-]

fair point, also "this raccoon does not exist"

by gpt514 hours ago|

parent|

prev|

[-]

I tried it on the ChatGPT web UI and it also worked, although the ham radio looks like a handbag to me.

https://postimg.cc/wyxgCgNY

by luxpir7 hours ago|

parent|

[-]

Nice, enjoyed the image as someone who has been to the events. But also easy raccoon placement :)

by djmips8 hours ago|

parent|

prev|

[-]

mmmm yummy OSLS?

by mirekrusin13 hours ago|

parent|

prev|

[-]

Can it generate non halloween version though?

This lower-is-better danse macabre, nightmares inducing ratio feels like interesting proxy for models capability.

by 13 hours ago|

parent|

prev|

[-]

deleted

by ireadmevs17 hours ago|

parent|

prev|

[-]

I found it on the 2nd image! On the 1st one not yet...

by dzhiurgis9 hours ago|

parent|

prev|

[-]

Cost me < 1 cents - https://elsrc.com/elsrc/waldo/wojak.jpg

And this medium quality, high resolution https://elsrc.com/elsrc/waldo/10_wojaks.jpg was 13cents

p.s. aaaand that's soft launch my SaaS above, you can replace wojak.jpg with anything you want and it will paint that. It's basically appending to prompt defined by elsrc's dashboard. Hopefully a more sane way to manage genai content. Be gentle to my server, hn!

by Barbing10 hours ago|

parent|

prev|

[-]

>I think that image cost 40 cents.

Kinda made me sad assuming the author didn't license anything to OpenAI.

I recognize it could revert (99% of?) progress if all the labs moved to consent-based training sets exclusively, but I can't think of any other fair way.

$.40 does not represent the appropriate value to me considering the desirability of the IP and its earning potential in print and elsewhere. If the world has to wait until it’s fair, what of value will be lost? (I suppose this is where the big wrinkle of foreign open weight models comes in.)

by rafram9 hours ago|

parent|

[-]

License what? The concept of a hidden object search? The only stylistic similarity here is the viewing angle. Where’s Waldo comics are flat, brightly colored line drawings that look nothing like this at all.

by Barbing8 hours ago|

parent|

[-]

Well, I recognized the style from even the new physical books on sale today, but I don’t know art well enough to use a term like flat.

I am not an art expert but I’m perhaps a reasonable consumer and there is possibility of confusion if someone sells AI Where’s Waldo knockoff books at the dollar store, maybe until I take a closer look.

by 10 hours ago|

parent|

prev|

[-]

deleted

by makira18 hours ago|

prev|

[-]

> though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure

I see an opportunity for a new AI test!

by vunderba17 hours ago|

parent|

[-]

There have already been several attempts to procedurally generate Where’s Waldo? style images since the early Stable Diffusion days, including experiments that used a YOLO filter on each face and then processed them with ADetailer.

It's a difficult test for genai to pass. As I mentioned in a different thread, it requires a holistic understanding (in that there can only be one Waldo Highlander style), while also holding up to scrutiny when you examine any individual, ordinary figure.

by simonw18 hours ago|

parent|

prev|

[-]

I've actually been feeding them into Claude Opus 4.7 with its new high resolution image inputs, with mixed results - in one case there was no raccoon but it was SURE there was and told me it was definitely there but it couldn't find it.

by 17 hours ago|

parent|

[-]

deleted

by halamadrid9 hours ago|

prev|

[-]

Really hard to look at these images given how not human like the humans are. A few are ok, but a lot are disfigured or missing parts and its hard to find a raccoon in here.

by vova_hn212 hours ago|

prev|

[-]

Thanks for the image, I will see their faces in my nightmares.

by vunderba12 hours ago|

parent|

[-]

This happens all too frequently when you ask a GenAI model to create an image with a large crowd especially a “Where’s Waldo?” style scenes, where by definition you’re going to be examining individual faces very closely.

by hackable_sand10 hours ago|

parent|

prev|

[-]

What about the faces of the people ChatGPT killed?

by marricks13 hours ago|

prev|

[-]

Like... this has things that AI will seemingly always be terrible at?

At some point the level of detail is utter garbo and always will be. An artist who was thoughtful could have some mistakes but someone who put that much time into a drawing wouldn't have:

- Nightmarish screaming faces on most people

- A sign that points seemingly both directions, or the incorrect one for a lake and a first AID tent that doesn't exist

- A dog in bottom left and near lake which looks like some sort of fuzzy monstrosity...

It looks SO impressive before you try to take in any detail. The hand selected images for the preview have the same shit. The view of musculature has a sternocleidomastoid with no clavicle attachment. The periodic table seems good until you take a look at the metals...

We're reconfiguring all of our RAM & GPUs and wasting so much water and electricity for crappier where's Waldos??

by p1esk13 hours ago|

parent|

[-]

AI will seemingly always be ...

You do realize that the whole image generation field is barely 10 years old?

I remember how I was able to generate mnist digits for the first time about 10 years ago - that seemed almost like magic!

by pants217 hours ago|

prev|

[-]

The second 4K image definitely has a raccoon on the left there! Nice.

by nerdsniper13 hours ago|

prev|

[-]

That is a devilishly difficult prompt for current diffusion tasks. Kudos.

by ritzaco18 hours ago|

prev|

[-]

haha took me a while to notice that one of the buildings is labelled 'Ham radio'

by arealaccount18 hours ago|

prev|

[-]

I see the raccoon

by ElFitz17 hours ago|

prev|

[-]

Damn. There’s a fun game app to make here ^^

by dymk13 hours ago|

parent|

[-]

Is there? The moment you look closely at the puzzle (which is... the whole point of Where's Waldo), you notice all the deformities and errors.

by ElFitz7 hours ago|

parent|

[-]

Yes, it’s not there yet. But nothing unsolvable. First thing that comes to mind would be generating smaller portion at the same resolution, then expand through tiling (although one might need to use another service & model for this), like we used to do with Stable Diffusion years ago.

Another option would be generating these large images, splitting them into grids, and using inpainting on each "tile" to improve the details. Basically the reverse of the first one.

Both significantly increase costs, but for the second one having what Images 2.0 can produce as an input could help significantly improve the overall coherence.

by amelius5 hours ago|

parent|

prev|

[-]

Yes sounds more like a fun research project instead.

by 17 hours ago|

prev|

[-]

deleted

by 17 hours ago|

prev|

[-]

deleted

by tptacek18 hours ago|

prev|

[-]

5.4 thinking says "Just right of center, immediately to the right of the HAM RADIO shack. Look on the dirt path there: the raccoon is the small gray figure partly hidden behind the woman in the red-and-yellow shirt, a little above the man in the green hat. Roughly 57% from the left, 48% from the top."

(I don't think it's right).

by ritzaco18 hours ago|

parent|

[-]

I tried

> please add a giant red arrow to a red circle around the raccoon holding a ham radio or add a cross through the entire image if one does not exist

and got this. I'm not sure I know what a ham radio looks like though.

https://i.ritzastatic.com/static/ffef1a8e639bc85b71b692c3ba1...

by jackpirate18 hours ago|

parent|

[-]

Also, the racoon it circled isn't in the original.

by Aurornis17 hours ago|

parent|

[-]

I love how perfectly this captures the difficulties of using generative AI for detection tasks.

by jetbalsa12 hours ago|

parent|

[-]

Oh god yes, I've been trying to make a LLM Assisted Magic the Gathering card scanner... its been a hell of a time trying to get it to just OCR card names well....

by what10 hours ago|

parent|

[-]

Why would you use an LLM for OCR?

by jubilanti6 minutes ago|

parent|

[-]

Because apparently that's what programming is and can only be these days...

by angiolillo18 hours ago|

parent|

prev|

[-]