OPENAI_API_KEY="$(llm keys get openai)" \
uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
-m gpt-image-2 \
"Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
--quality high --size 3840x2160
https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a... - I found the raccoon!I think that image cost 40 cents.
"Found the raccoon holding a ham radio in waldo2.png (3840×2160).
- Raccoon center: roughly (460, 1680)
- Ham radio (walkie-talkie) center: roughly (505, 1650) — antenna tip around (510, 1585)
- Bounding box (raccoon + radio): approx x: 370–540, y: 1550–1780
It's in the lower-left area of the image, just right of the red-and-white striped souvenir umbrella, wearing a green vest. "
Which is correct!This lower-is-better danse macabre, nightmares inducing ratio feels like interesting proxy for models capability.
And this medium quality, high resolution https://elsrc.com/elsrc/waldo/10_wojaks.jpg was 13cents
p.s. aaaand that's soft launch my SaaS above, you can replace wojak.jpg with anything you want and it will paint that. It's basically appending to prompt defined by elsrc's dashboard. Hopefully a more sane way to manage genai content. Be gentle to my server, hn!
Kinda made me sad assuming the author didn't license anything to OpenAI.
I recognize it could revert (99% of?) progress if all the labs moved to consent-based training sets exclusively, but I can't think of any other fair way.
$.40 does not represent the appropriate value to me considering the desirability of the IP and its earning potential in print and elsewhere. If the world has to wait until it’s fair, what of value will be lost? (I suppose this is where the big wrinkle of foreign open weight models comes in.)
I am not an art expert but I’m perhaps a reasonable consumer and there is possibility of confusion if someone sells AI Where’s Waldo knockoff books at the dollar store, maybe until I take a closer look.
I see an opportunity for a new AI test!
It's a difficult test for genai to pass. As I mentioned in a different thread, it requires a holistic understanding (in that there can only be one Waldo Highlander style), while also holding up to scrutiny when you examine any individual, ordinary figure.
At some point the level of detail is utter garbo and always will be. An artist who was thoughtful could have some mistakes but someone who put that much time into a drawing wouldn't have:
- Nightmarish screaming faces on most people
- A sign that points seemingly both directions, or the incorrect one for a lake and a first AID tent that doesn't exist
- A dog in bottom left and near lake which looks like some sort of fuzzy monstrosity...
It looks SO impressive before you try to take in any detail. The hand selected images for the preview have the same shit. The view of musculature has a sternocleidomastoid with no clavicle attachment. The periodic table seems good until you take a look at the metals...
We're reconfiguring all of our RAM & GPUs and wasting so much water and electricity for crappier where's Waldos??
You do realize that the whole image generation field is barely 10 years old?
I remember how I was able to generate mnist digits for the first time about 10 years ago - that seemed almost like magic!
Another option would be generating these large images, splitting them into grids, and using inpainting on each "tile" to improve the details. Basically the reverse of the first one.
Both significantly increase costs, but for the second one having what Images 2.0 can produce as an input could help significantly improve the overall coherence.
(I don't think it's right).
> please add a giant red arrow to a red circle around the raccoon holding a ham radio or add a cross through the entire image if one does not exist
and got this. I'm not sure I know what a ham radio looks like though.
https://i.ritzastatic.com/static/ffef1a8e639bc85b71b692c3ba1...
there was a very large bear in the first image; when asked to circle the raccoon it just turned the bear into a giant raccoon and circled it.