undefined

points

[-]

Prompts like this feel like it's using the wrong abstraction. The "obvious" thing to do with something like this would be to generate some code that generates the image and then run that code.

Inspired by this, I tried something much simpler. I asked it to draw 12 concentric circles. With three tries it always drew 10 instead. https://chatgpt.com/share/69e87d08-5a14-83eb-9a3b-3a8eb14692...

by LeifCarrotson49 minutes ago|

parent|

[-]

I think prompts like this are where agentic workflows come in to play. If you asked it to do generate the first 64 prime numbers, AI tools could do that. If you asked it to draw a charcoal image of Pokemon 13, it could do that. If you asked it to add a white Menlo 13 on a black background to the top left corner of that image, it could do that. If you asked it to do that 63 more times, it could do those things, and if you asked it to assemble those into a grid, it could.

It can't get that in a one-shot. Perhaps, though, it could figure out when it needs to break a problem into individual tasks to delegate to itself and assemble them at the end.

by dvt11 hours ago|

prev|

[-]

This is an amazing test and it's kinda' funny how terrible gpt-2-image is. I'd take "plagiarized" images (e.g. Google search & copy-paste) any day over how awful the OpenAI result is. Doesn't even seem like they have a sanity checker/post-processing "did I follow the instructions correctly?" step, because the digit-style constraint violation should be easily caught. It's also expensive as shit to just get an image that's essentially unusable.

by the_arun9 hours ago|

parent|

[-]

This is from Gemini - https://lens.usercontent.google.com/banana?agsi=CmdnbG9iYWw6...

by fblp8 hours ago|

parent|

[-]

Did it correctly follow the instructions? Don't know my pokemon well enough.

by minimaxir7 hours ago|

parent|

[-]

Essentially yes (bottom got distorted), but Gemini uses Nano Banana Pro or Nano Banana 2 so it's not a surprising result. The image I linked uses the raw API.

by thih94 hours ago|

parent|

[-]

Note that the styles are different; there are two digit images rendered in color.

Color charcoal drawings do exist, but it’s not what’s usually meant by “charcoal drawing”.

by anshumankmr9 hours ago|

parent|

prev|

[-]

that is interesting cause I feel gpt-image-1 did have that feature.

(source: https://chatgpt.com/share/69e83569-b334-8320-9fbf-01404d18df...)

by weird-eye-issue8 hours ago|

parent|

[-]

You are comparing ChatGPT to a raw image model. These are two completely different things. ChatGPT takes your input, modifies the prompt and then passes it to the image model and then will maybe read the image and provide output. The image model like through the API just takes the prompt verbatim and generates an image.

by minimaxir7 hours ago|

parent|

[-]

Nano Banana Pro and ChatGPT Images 2.0 also tweak the prompt because they can think.

by weird-eye-issue7 hours ago|

parent|

[-]

Yes exactly, "ChatGPT Images 2.0" is in ChatGPT. That is not a model.

by hyperadvanced9 hours ago|

parent|

prev|

[-]

I wouldn’t say it’s terrible. I wouldn’t say it’s a huge step forward in terms of quality compared to what I’ve seen before from AI

by AussieWog933 hours ago|

prev|

[-]

For what it's worth, NBP made some mistakes too.

Artistic oddities aside (why are the 8-bit sprites 16-bit, why do the charcoal drawings have colour, why does the art of specifically the Gen 1 Pokemon look so off.), 271 is Lombre, not Lotad.

by rrr_oh_man11 hours ago|

prev|

[-]

Why would you consider this a good prompt?

by minimaxir11 hours ago|

parent|

[-]

Because both Nano Banana Pro and ChatGPT Images 2.0 have touted strong reasoning capabilities, and this particular prompt has more objective, easy-to-validate criteria as opposed to the subjective nature of images.

I have more subjective prompts to test reasoning but they're your-mileage-may-vary (however, gpt-2-image has surprisingly been doing much better on more objective criteria in my test cases)

by o1044936610 hours ago|

parent|

prev|

[-]

[flagged]

by minimaxir10 hours ago|

parent|

[-]

"Quirky and obscure" has the functional benefit of ensuring the source question is not in the training data/outside the median user prompt, and therefore making the model less likely to cheat.

We have enough people complaining about Simon Willison's pelican test.

by o104493665 hours ago|

parent|

[-]

When you program, do you consider using your prior knowledge of programming cheating?

by Bjartr9 hours ago|

parent|

prev|

[-]

What would make the prompt a better actual evaluation in your judgement?

by leptons6 hours ago|

parent|

[-]

Not focusing on pokemon for a start. Maybe use something more people can recognize and evaluate. I have zero knowledge of pokemon, I see it as a niche thing for ultra-nerdy people, and not something everyone is familiar with. Nothing about that test can be evaluated by anyone but a pokemon expert. Sorry, but pokemon isn't as mainstream as some people might think it is.

by tailscaler20269 hours ago|

parent|

prev|

[-]

still #opentowork huh

by beepbooptheory8 hours ago|

parent|

[-]

Where does one even use that hashtag?

by minimaxir7 hours ago|

parent|

[-]

It's a LinkedIn joke.

by codemog10 hours ago|

parent|

prev|

[-]

Ah yes, also known as C++ enjoyers.

by vincentbuilds6 hours ago|

prev|

[-]

banana Pro gets the logic and punts on the art; gpt-2-image gets the art and punts on the logic. Feels like instruction-following and creativity sit on opposite ends of the same slider.

by dieortin2 hours ago|

parent|

[-]

This feels incredibly AI generated

by doginasuit12 minutes ago|

parent|

[-]

The random accusations of AI generated comments are the most annoying part of the unfolding AI dystopia.

by Palmik3 hours ago|

prev|

[-]

I do not think this is a good prompt or useful benchmark, but nonetheless, it seems to work better for me: https://chatgpt.com/share/69e88a94-ded8-8395-b5dc-abceb2f44d...

by pfortuny3 hours ago|

prev|

[-]

Just try a 23-sided plane convex polygon.

by razorbeamz4 hours ago|

prev|

[-]

Neither of them drew them in an 8-bit style either. It's way too many colors.

by dodslaser3 hours ago|

parent|

[-]

Maybe they're so advanced they learned to write to the palette registers mid-scanline.

by Razengan7 hours ago|

prev|

[-]

Even a few months ago, ChatGPT/Sora's image generation performed better than Gemini/Nano Banana for certain weird prompts:

Try things like: "A white capybara with black spots, on a tricycle, with 7 tentacles instead of legs, each tentacle is a different color of the rainbow" (paraphrased, not the literal exact prompt I used)

Gemini just globbed a whole mass of tentacles without any regards to the count

by heroku8 hours ago|

prev|

[-]

[dead]

by m3kw97 hours ago|

prev|

[-]

Prob a very unscientific way to test an image model. This would me likely because they have the reasoning turned down and let its instant output takeover

by minimaxir6 hours ago|

parent|

[-]

There's no good scientific way to test a closed-source model with both nondeterministic and subjective output.

This example image was generated using the API on high, not the low reasoning version. (it is slow and takes 2 minutes lol)

by crustaceansoup6 hours ago|

parent|

prev|

[-]

If the results are quantifiable/objective and repeatable it's scientific, how is it not scientific?

The reasoning amount is part of the evaluation isn't it?

by TeMPOraL5 hours ago|

parent|

prev|

[-]

This is the best kind of science there is: direct, empirical test.