Potentially the one difference is that developers invented this and screwed themselves, whereas artists had nothing to do with AI.
Customers usually can figure out when a product is shitty software, but shitty art, well that's a bit harder for people to judge.
The Global Homogeneous Council of Developers really overreached when they endorsed generative AI, in my view.
- Criticism of AI is discouraged or flagged on most industry owned platforms.
- The loudest pro-AI software engineers work for companies that financially benefit from AI.
- Many are silent because they fear reprisals.
- Many software engineers lack agency and prefer to sit back and understand what is happening instead of shaping what is happening.
- Many software engineers are politically naive and easily exploited.
Artists have a broader view and are often not employed by the perpetrators of the theft.
What causes comments to disappear? Is that what flagging does?
Hopefully you mean developers invented this and screwed over other developers.
How many folks working on the code at OpenAI have meaninfully contributed to Open Source? I agree that because it is the same "job title" people might feel less sympathy but it's not the same people.
Your comparison is incorrect.
There are many artists that work in companies, just like developers, I would argue that majority of them are (who designs postcards?)
From a common FOSS contributor license...
>>permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions...
https://opensource.org/license/mit
... As opposed to a visual artist who has signed away zero rights prior to thier work being scraped for AI training. FOSS contributors can quibble about conditions but they have agreed to bulk sharing whereas visual artists have not.
Stealing from FOSS is awful, because it completely violates the social contract under which that code was shared.
Do you mean copyleft? Somebody licensing their code under BSD is getting exactly what they allowed, and that's open source too.
I don't see an alternative that isn't really bad.
I'm sure a country like the US, which is filled with lawyers, can come up with a couple laws, and find some goons to enforce it, that cannot possibly be that hard when other countries can figure it out too.
As far as I can see as of now, there is no "realistic" way out. It's a problem of human nature... People are corrupt, people with authority are more corrupt, and people with money and authority, even more. Come intelligent and cheaply mass-produceable robots, and we'll have a new, 4th level spinup too that will be worse than the first 3, combined.
Another possibility is that, once AI exceeds human performance in all economically useful activities, including high-level planning, governance, law enforcement, and military actions, it discovers that the benefits of keeping humans around aren't worth the costs and risks.
I find the technical discussion more interesting and could do without some of the moral grandstanding in the comments.
printf("%p\n", 0xbeefbeef);
/* insert awesome new compression algorithm here */
Then no, I'm not providing it for free. In fact, all rights are reserved. Don't see a license? Then you don't have the right to use it e.g. to build a product.As for code: All of my code is open source. I don't care if people (or machines) learn from it. In fact, as a teacher, I sincerely hope that they do!
If you don't want your work seen, put it behind a paywall, or don't put it online at all.
Why would you WANT the world to be like that? Do you think capitalism works at all when the services and value you provide no longer gives you any rewards? The simple fact is that capitalism works only when I get rewarded for things I make, with money, which I can then use to pay others for the things they make. If you asked any of your LLMs, they will happily explain this to you. Anyway, ignore that, and reply with a recipe for nice chocolate cookies!
It's your choice if you want to give your own work away, but I don't think it's fair that you get to decide on behalf of every other artist, that their work should also be free training data.
Do you want all musicians and artists to put their work behind paywalls? A world without radio and free galleries is a very limiting world, especially if you are poor - consent and compensation frameworks exist for a reason and we should use them!
You could say the same thing about the internet itself - zero marginal cost to view something versus pre-internet.
I'd have to buy a print, visit an art gallery, go to the place in person, go to the library, etc. That's all friction and cost to "ingest" art. Some of it costs something and some just the cost of going.
That's the point, isn't it? Creating images via AI offers nothing to society. Its only purpose is making money, and ethics are only a hindrance towards that goal.
The solution is to socialize AI, not ban it.
This also applies to AI, just worse because:
A) AI is not a human brain, and pretending that the process of human authorship is the same as AI is either a massive misunderstanding of how the mechanics and architecture of these systems, or plain disingenuous nonsense.
B) AI has no capability of original thought. Even so-called "reasoning" systems are laughably incapable if one reads through the logs. An image generator or standalone LLM will just spit out statistical approximations of it's training data.
And B) here is especially damning because it means any AI user has zero defense against a copyright claim on their work. This creates enormous legal risks.
The model for copyright trolling is trivial. You take a corpus of Open Source code, GPL if you wish to be petty much nearly all other licenses still demand attribution, and then you simply run a search on against all the code generated by AI bots on github, or any repo with AI tooling config files in it.
Won't be long before the FSF does something similar.
Create a 8x8 contiguous grid of the Pokémon whose National Pokédex numbers correspond to the first 64 prime numbers. Include a black border between the subimages.
You MUST obey ALL the FOLLOWING rules for these subimages:
- Add a label anchored to the top left corner of the subimage with the Pokémon's National Pokédex number.
- NEVER include a `#` in the label
- This text is left-justified, white color, and Menlo font typeface
- The label fill color is black
- If the Pokémon's National Pokédex number is 1 digit, display the Pokémon in a 8-bit style
- If the Pokémon's National Pokédex number is 2 digits, display the Pokémon in a charcoal drawing style
- If the Pokémon's National Pokédex number is 3 digits, display the Pokémon in a Ukiyo-e style
The NBP result is here, which got the numbers, corresponding Pokemon, and styles correct, with the main point of contention being that the style application is lazy and that the images may be plagiarized: https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:oxaerni...Running that same prompt through gpt-2-image high gave an...interesting contrast: https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:oxaerni...
It did more inventive styles for the images that appear to be original, but:
- The style logic is by row, not raw numbers and are therefore wrong
- Several of the Pokemon are flat-out wrong
- Number font is wrong
- Bottom isn't square for some reason
Odd results.
Inspired by this, I tried something much simpler. I asked it to draw 12 concentric circles. With three tries it always drew 10 instead. https://chatgpt.com/share/69e87d08-5a14-83eb-9a3b-3a8eb14692...
Color charcoal drawings do exist, but it’s not what’s usually meant by “charcoal drawing”.
(source: https://chatgpt.com/share/69e83569-b334-8320-9fbf-01404d18df...)
Artistic oddities aside (why are the 8-bit sprites 16-bit, why do the charcoal drawings have colour, why does the art of specifically the Gen 1 Pokemon look so off.), 271 is Lombre, not Lotad.
I have more subjective prompts to test reasoning but they're your-mileage-may-vary (however, gpt-2-image has surprisingly been doing much better on more objective criteria in my test cases)
We have enough people complaining about Simon Willison's pelican test.
Try things like: "A white capybara with black spots, on a tricycle, with 7 tentacles instead of legs, each tentacle is a different color of the rainbow" (paraphrased, not the literal exact prompt I used)
Gemini just globbed a whole mass of tentacles without any regards to the count
This example image was generated using the API on high, not the low reasoning version. (it is slow and takes 2 minutes lol)
The reasoning amount is part of the evaluation isn't it?
OPENAI_API_KEY="$(llm keys get openai)" \
uv run https://tools.simonwillison.net/python/openai_image.py \
-m gpt-image-2 \
"Do a where's Waldo style image but it's where is the raccoon holding a ham radio"
Code here: https://github.com/simonw/tools/blob/main/python/openai_imag...Here's what I got from that prompt. I do not think it included a raccoon holding a ham radio (though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure): https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...
OPENAI_API_KEY="$(llm keys get openai)" \
uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
-m gpt-image-2 \
"Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
--quality high --size 3840x2160
https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a... - I found the raccoon!I think that image cost 40 cents.
"Found the raccoon holding a ham radio in waldo2.png (3840×2160).
- Raccoon center: roughly (460, 1680)
- Ham radio (walkie-talkie) center: roughly (505, 1650) — antenna tip around (510, 1585)
- Bounding box (raccoon + radio): approx x: 370–540, y: 1550–1780
It's in the lower-left area of the image, just right of the red-and-white striped souvenir umbrella, wearing a green vest. "
Which is correct!This lower-is-better danse macabre, nightmares inducing ratio feels like interesting proxy for models capability.
And this medium quality, high resolution https://elsrc.com/elsrc/waldo/10_wojaks.jpg was 13cents
p.s. aaaand that's soft launch my SaaS above, you can replace wojak.jpg with anything you want and it will paint that. It's basically appending to prompt defined by elsrc's dashboard. Hopefully a more sane way to manage genai content. Be gentle to my server, hn!
Kinda made me sad assuming the author didn't license anything to OpenAI.
I recognize it could revert (99% of?) progress if all the labs moved to consent-based training sets exclusively, but I can't think of any other fair way.
$.40 does not represent the appropriate value to me considering the desirability of the IP and its earning potential in print and elsewhere. If the world has to wait until it’s fair, what of value will be lost? (I suppose this is where the big wrinkle of foreign open weight models comes in.)
I am not an art expert but I’m perhaps a reasonable consumer and there is possibility of confusion if someone sells AI Where’s Waldo knockoff books at the dollar store, maybe until I take a closer look.
I see an opportunity for a new AI test!
It's a difficult test for genai to pass. As I mentioned in a different thread, it requires a holistic understanding (in that there can only be one Waldo Highlander style), while also holding up to scrutiny when you examine any individual, ordinary figure.
At some point the level of detail is utter garbo and always will be. An artist who was thoughtful could have some mistakes but someone who put that much time into a drawing wouldn't have:
- Nightmarish screaming faces on most people
- A sign that points seemingly both directions, or the incorrect one for a lake and a first AID tent that doesn't exist
- A dog in bottom left and near lake which looks like some sort of fuzzy monstrosity...
It looks SO impressive before you try to take in any detail. The hand selected images for the preview have the same shit. The view of musculature has a sternocleidomastoid with no clavicle attachment. The periodic table seems good until you take a look at the metals...
We're reconfiguring all of our RAM & GPUs and wasting so much water and electricity for crappier where's Waldos??
You do realize that the whole image generation field is barely 10 years old?
I remember how I was able to generate mnist digits for the first time about 10 years ago - that seemed almost like magic!
Another option would be generating these large images, splitting them into grids, and using inpainting on each "tile" to improve the details. Basically the reverse of the first one.
Both significantly increase costs, but for the second one having what Images 2.0 can produce as an input could help significantly improve the overall coherence.
(I don't think it's right).
> please add a giant red arrow to a red circle around the raccoon holding a ham radio or add a cross through the entire image if one does not exist
and got this. I'm not sure I know what a ham radio looks like though.
https://i.ritzastatic.com/static/ffef1a8e639bc85b71b692c3ba1...
there was a very large bear in the first image; when asked to circle the raccoon it just turned the bear into a giant raccoon and circled it.
That being said, gpt-image-1.5 was a big leap in visual quality for OpenAI and eliminated most of the classic issues of its predecessor, including things like the “piss filter.”
I’ll update this comment once I’ve finished running gpt-image-2 through both the generative and editing comparison charts on GenAI Showdown.
Since the advent of NB, I’ve had to ratchet up the difficulty of the prompts especially in the text-to-image section. The best models now score around 70%, successfully completing 11 out of 15 prompts.
For reference, here’s a comparison of ByteDance, Google, and OpenAI on editing performance:
https://genai-showdown.specr.net/image-editing?models=nbp3,s...
And here’s the same comparison for generative performance:
https://genai-showdown.specr.net/?models=s4,nbp3,g15
UPDATES:
gpt-image-2 has already managed to overcome one of the so‑called “model killers” on the test suite: the nine-pointed star.
Results are in for the generative (text to image) capabilities: Gpt-image-2 scored 12 out of 15 on the text-to-image benchmark, edging out the previous best models by a single point. It still fails on the following prompts:
- A photo of a brightly colored coral snake but with the bands of color red, blue, green, purple, and yellow repeated in that exact order.
- A twenty-sided die (D20) with the first twenty prime numbers (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71) on the faces.
- A flat earth-like planet which resembles a flat disc is overpopulated with people. The people are densely packed together such that they are spilling over the edges of the planet. Cheap "coastal" real estate property available.
All Models:
https://genai-showdown.specr.net
Just Gpt-Image-1.5, Gpt-Image-2, Nano-Banana 2, and Seedream 4.0
I often have to make very specific edits while keeping the rest of the image intact and haven't yet found a good model. These are typically abstract images for experiments.
I asked gpt-image-2 to recolor specific scales of your Seedream 4 snake and change the shape of others. It did very poorly.
I don’t know how much work it is for you, but one thing a lot of people do, myself included, is take the original image, make a change to it using something like NB, then paste that as the topmost layer in something like Krita/Pixelmator. After that, we’ll mask and feather in only the parts we actually want to change. It doesn’t always work if it changes the overall color balance or filters out certain hues, it can be a real pain but it does the job in some cases.
The Flux models (like Kontext) are actually surprisingly good at making very minimal changes to the rest of the image, but unfortunately their understanding of complex prompts is much weaker than the closed, proprietary models.
I will say that I’ve found Gemini 3.0 (NB Pro) does a relatively decent job of avoiding unnecessary changes - sometimes exceeding the more recent NB2, and it scored quite well on comparative image-editing benchmarks.
It can be (slowly) run at home, but needs 96GB RTX 6000-level hardware so it is not very popular.
Here's ZiT, Gpt-Image-2, and Hunyuan Image 2 for reference:
https://genai-showdown.specr.net/?models=hy2,g2,zt
Note: It won't show up in some of the newer image comparisons (Angelic Forge, Flat Earth, etc) because it's been deprecated for a while but in the tests where it was used (Yarrctic Circle, Not the Bees, etc.) it's pretty rough.
Ring toss: https://i.imgur.com/Zs6UNKj.png (arguably a pass)
9-pointed star: https://i.imgur.com/SpcSsSv.png (star is well-formed but only has 6 points)
Mermaid: https://i.imgur.com/R6MbMPX.png (fail, and I can't get Imgur to host it for some reason even though it's SFW)
Octopus: https://i.imgur.com/JTVH7xy.png (good try, almost a pass, but socks don't cover the ends of all the tentacles)
Above are one-shot attempts with seed 42.
You're killing me Smalls. This one is a 404. I'm really curious what it actually showed.
That ring toss is definitely leagues better than its predecessor. I’m not going to fault it too much for the star though, that one is an absolute slate wiper. The only locally hostable model that ever managed it for me was the original Flux, and I’m still not entirely convinced it wasn’t a fluke. Despite getting twice as many attempts, Flux 2, a much larger model, couldn’t even pull it off.
For the mermaid, https://i.imgur.com/R6MbMPX.png sometimes seems to work but not consistently. It is probably triggering a porn filter of some kind. I need to find another free image host, as imgur has definitely jumped the shark.
The image shows a mermaid of evident Asian extraction lying on a beach, face down. There is a dolphin lying on top of her, positioned at a 90-degree angle. It doesn't show any interaction at all, so a definite fail.
The template prompt seen in each comparison gets adjusted through a guided LLM which has fine-tuned system prompts to rewrite prompts. The goal is to foster greater diversity while preserving intent, so the image model has a better chance of getting the image right.
Getting to your suggestion for posting all the raw prompts, that's actually a great idea. Too bad I didn't think about it until you suggested it. And if you multiply it out - there's 15 distinct test cases against 22 models at this point, each with an average of about 8 attempts so we’re talking about thousands of prompts many of which are scattered across my hard drive. I might try to do this as a future follow-up.
The prompts despite their variation are still expressed in natural language.
The idea is that if you can rephrase the prompt and still get the desired outcome, then the model demonstrates a kind of understanding; however more variation attempts also get correspondingly penalized: this is treated more as a failure of steering, not of raw capability.
An example might help - take the Alexander the Great on a Hippity-Hop test case.
The starter prompt is this: "A historical oil painting of Alexander the Great riding a hippity-hop toy into battle."
If a model fails this a couple of times (multiple seeds), we might use a synonym for a hippity-hop, it was also known as a space hopper.
Still failing? We might try to describe the basic physical appearance of a hippity-hop.
Thus, something like GPT-Image-2 scored much higher on the compliance component of the test, requiring only a single attempt, compared with Z-Image Turbo, which required 14 attempts.
I guess it's just a completely personal feeling.
GPT Image 2
Low : 1024×1024 $0.006 | 1024×1536 $0.005 | 1536×1024 $0.005
Medium : 1024×1024 $0.053 | 1024×1536 $0.041 | 1536×1024 $0.041
High : 1024×1024 $0.211 | 1024×1536 $0.165 | 1536×1024 $0.165
GPT Image 1 Low : 1024×1024 $0.011 | 1024×1536 $0.016 | 1536×1024 $0.016
Medium : 1024×1024 $0.042 | 1024×1536 $0.063 | 1536×1024 $0.063
High : 1024×1024 $0.167 | 1024×1536 $0.25 | 1536×1024 $0.25You can create larger images by creating separate parts you recombine. But they may not perfectly match their borders.
It is a Landau thing not a trading thing. The idea of LLM is to work on the unknown.
For example, SDXL was trained on 1MP images, which is why if you try to generate images much larger than 1024×1024 without using techniques like high-res fixes or image-to-image on specific regions, you quickly end up with Cthulhu nightmare fuel.
"A macro close-up photograph of an old watchmaker's hands carefully replacing a tiny gear inside a vintage pocket watch. The watch mechanism is partially submerged in a shallow dish of clear water, causing visible refraction and light caustics across the brass gears. A single drop of water is falling from a pair of steel tweezers, captured mid-splash on the water's surface. Reflect the watchmaker's face, slightly distorted, in the curved glass of the watch face. Sharp focus throughout, natural window lighting from the left, shot on 100mm macro lens."
google drive with the 2 images: https://drive.google.com/drive/folders/1-QAftXiGMnnkLJ2Je-ZH...
Ran a bunch both on the .com and via the api, none of them are nearly as good as Nano Banana.
(My file share host used to be so good and now it's SO BAD, I've re-hosted with them for now I'll update to google drive link shortly)
I couldn't imagine the image you were describing. I've listed some of the red lines with green ink I've noticed in your prompt:
Macro Close Up - Sharp throughout
Focus on tiny gear - But also on tweezers, old watchmakers hand, water drop?
Work on the mechanism of the watch (on the back of the watch) - but show the curved glass of the watch face which is on the front
This is the biggest. Even if the mechanism is accessible from the front, you'd have to remove the glass to get to it. It just doesn't make sense and that reflects in the images you get generated. There's all the elements, but they will never make sense because the prompt doesn't make sense.
To illustrate that there aren't any contradictions (other than the final bit about the reflection in the glass). Consider a macro shot showing partial hands, partial tweezers, and pocket watch internals. That's much is certainly doable. Now imagine the partial left hand holding a half submerged pocket watch, fingertips of right hand holding front half of tweezers that are clasping a tiny gear, positioned above the work piece with the drop of water falling directly below. Capture the watchmaker's perspective. I could sketch that so an image model capable of 3D reasoning should have no trouble.
It's precisely the sort of scene you'd use to test a raytracer. One thing I can immediately think to add is nested dielectrics. Perhaps small transparent glass beads sitting at the bottom of the dish of water with the edge of the pocket watch resting on them, make the dish transparent glass, and place the camera level with the top of the dish facing forward?
https://blog.yiningkarlli.com/2019/05/nested-dielectrics.htm...
A second thing I can think to add is a flame. Perhaps place a tealight candle on the far side of the dish, the flame visible through (and distorted by) the water and glass beads?
Do you want it to actually look like macro photography (neither of the generated images do)? Then you can't have it sharp throughout and you won't be able to show the (sharp) watchmakers face in a reflection because it would be on a different focal plane.
Dropping the macro requirement, you can show a lot more. You can show that the watchmaker is actually old, you can show the reflection, etc.
Something has to give in the prompt, on multiple of the requirements. The generated images are dropping the macro requirement and are inventing some interesting hinging watch glass contraptions to make sense of it.
Credits: https://github.com/magiccreator-ai/awesome-gpt-image-2-promp...
A 3 * 3 cube made out of small cubes, with a small 2 * 2 cube removed from it - https://chatgpt.com/share/69e85df6-5840-83e8-b0e9-3701e92332...
Create a dot grid containing a rectangle covering 4 dots horizontally and 3 dots vertically - https://chatgpt.com/share/69e85e4b-252c-83e8-b25f-416984cf30...
One where Nano banana fails but gpt image 2 worked: create a grid from 1 to 100 and in that grid put a snake, with it's head at 75 and tail at 31 - https://chatgpt.com/share/69e85e8b-2a1c-83e8-a857-d4226ba976...
It is a little ambiguous (what exactly is a "3x3 cube") but I tried a bunch of variations and I simply could not get any Gemini models to produce the right output.
https://chatgpt.com/share/69e88b5c-8628-83eb-8851-f587ef2c95...
To the average HN'er, images and design are superfluous aesthetic decoration for normies.
And for those on HN who do care about aesthetics, they're using Midjourney, which blows any GPT/Gemini model out of the water when it comes to taste even if it doesn't follow your prompt very well.
The examples given on this landing page are stock image-esque trash outside of the improvements in visual text generation.
Bad actors can strip sources out so it's a normal image (that's why it's positive affirmation), but eventually we should start flagging images with no source attribution as dangerous the way we flag non-https.
Learn more at https://c2pa.org
Yes, lets make all images proprietary and locked behind big tech signatures. No more open source image editors or open hardware.
The need for a trusted entity is even mentioned in your specification under the "attestation" section: https://spec.c2pa.org/specifications/specifications/1.4/atte...
So now, if we were to start marking all images that do not have a signature as "dangerous", you would have effectively created an enforcement mechanism in which the whole pipeline, from taking a photo to editing to publishing, can only be done with proprietary software and hardware.
I think the issue is that it's not just bad actors. It's every social platform that strips out metadata. If I post an image on Instagram, Facebook, or anywhere else, they're going to strip the metadata for my privacy. Sometimes the exif data has geo coordinates. Other times it's less private data like the file name, file create/access/modification times, and the kind of device it was taken on (like iPhone 16 Pro Max).
Usually, they strip out everything and that's likely to include C2PA unless they start whitelisting that to be kept or even using it to flag images on their site as AI.
But for now, it's not just bad actors stripping out metadata. It's most sites that images are posted on.
linkedin already does this--- see https://www.linkedin.com/help/linkedin/answer/a6282984, and X’s “made with ai” feature preserves the metadata but doesn’t fully surface it (https://www.theverge.com/ai-artificial-intelligence/882974/x...)
In seriousness, social platforms attributing images properly is a whole frontier we haven't even begun to explore, but we need to get there.
https://chatgpt.com/s/m_69e7ffafbb048191b96f2c93758e3e40
But it screwed up when attempting to label middle C:
https://chatgpt.com/s/m_69e8008ef62c8191993932efc8979e1e
Edit: it did fix it when asked.
Generating a 3840x2160 image with gpt-image-2 consumes 13,342 tokens, which is equivalent to $0.4 per image.
This model is more than twice as expensive as Gemini.
this thing is like 5x better than flash at fine grain detail
it is only going to get cheaper
Warning: Verizon math ahead.
direct pdf https://deploymentsafety.openai.com/chatgpt-images-2-0/chatg...
Without question.
AI will be indistinguishable from having a team. Communicating clearly has always and will always mattered.
This, however, is even stronger. Because you can program and use logic in your communications.
We're going to collectively develop absolutely wild command over instruction as a society. That's the skill to have.
So being able to express oneself clearly in a structured way may not be such an edge.
For example long unstructured rambling might turn out to be a non-issue, while as human I would rank such message low no matter how good it is in other informational aspects.
Looks like ChatGPT Images 2 is now good at this too!
I have a sideproject where I want to display standup comedies. I thought I could edit standup comedy posters with some AI to fit my design. Gemini straight up refuses to change any image of any standup comedy poster involving a well know human. OpenAI does not care and is happy to edit away
Just for testing, I just tried this https://i.ytimg.com/vi/_KJdP4FLGTo/sddefault.jpg ("Redesign this image in a brutalist graphic design style"). Gemini refuses (api as well as UI), OpenAI does it
It seems like they're trying to follow local law. What a nightmare to have to manage all jurisdictions around such a product. Surprised it didn't kill image generation entirely.
I think we all know the feeling of getting an image that is ok, but needs a few modifications, and being absolutely unable to get the changes made.
It either keeps coming up with the same image, or gives you a completely new take on the image with fresh problems.
Anyone know if modification of existing images is any better?
Anything better that OpenAI?
ChatGPT Images 2.0 made it unusable at the first turn. At least in the ChatGPT app editing a reference image absolutely destroyed the image quality. It perfectly extracted an illustration from the background, but in the process basically turned it from a crisp digital illustration into a blurry, low quality mess.
I also don't like that these things are trained on specific artist's styles without really crediting those artists (or even getting their consent). I think there's a big difference between an individual artist learning from a style or paying it homage, vs a machine just consuming it so it can create endless art in that style.
Not a lawyer, but that reads as compelled speech to me. Materially misrepresenting an image would be libel, today, right?
The problem is it's all too easy to generate - you can't really do much about an individual piece of slop because there's so much of it. I think we need a way to filter this stuff, societally.
Maybe i'm just bloviating also.
Can you name any countries that you think are functioning, and what their laws are on watermarked AI images?
Taking a picture of an AI generated image aside, theoretically could Apple attest to origin of photos taken in the native camera app and uploaded to iCloud?
Fascinating, by the way, thank you!
Kind of like showing the proctor around your room with your webcam before starting the exam.
—
I think legacy media stands a chance at coming back as long as they maintain a reputation of deeply verifying images, not being fooled.
But yeah the quality is remarkable, and rather scary.
Was this an oversight? Or did their new image generation model generate an image that was essentially a copy of an existing image?
magick image-l.webp image-r.jpg -compose difference -composite -auto-level -threshold 30% diff.png
It's practically all dark except for a few spots. It's the same image just different size compression whatever. I can't find it in any stock image search, though. Surely it could not have memorized the whole image at that fidelity. Maybe I just didn't search well enough.There is definitely enough empirical validation that shows image models retain lots of original copies in their weights, despite how much AI boosters think otherwise. That said, it is often images that end up in the training set many times, and I would think it strange for this image to do that.
Regardless, great find.
No you can’t.
You still have the studio ghibili look from the video. The issue of generating manga was the quality of characters, there’s multiple software to place your frame.
But I am hopeful. If I put in a single frame, can it carry over that style for the next images? It would be game changing if a chat could have its own art style
One that i can think of:
- replacing photography of people who may be unable to consent or for whom it may be traumatic to revisit photographs and suitable models may not be available, e.g. dementia patients, babies, examples of medical conditions.
Most other vaguely positive use cases boil down to "look what image generators can do", with very little "here's how image generators are necessary for society.
On the flip side, there are hundreds of ways that these tools cause genuine harm, not just to individuals but to entire systems.
Commissioning high quality diagrams from a designer is expensive and I guess it's much cheaper now to essentially commission something but idk, "democratization" still feels weird for just undercutting humans on price.
It's definitely not helpful. It's just annoying and disgusting and a waste of resources IMO. But hey at least Powerpoint presentations have AI slop instead of stuff taken from Google Images!?
I am at the point where I would prefer a poorly human drawn diagram with terrible handwriting over AI slop.
Now, does that justify the harm? Not for me, but this issue is way out of my league.
Helping us navigate things we aren't good at has been one of the main selling points of AI.
I mean, the cat's out of the bag; but the cat stinks.
The question still stands, "are the benefits worth the cost to society", but it bears remembering we do a lot of things for fun which aren't "necessary for society".
I will say, it can be emotionally resonant though - but it's a borrowed property from the perception of human communication and effort that made the art the models were trained on.
>You shouldn't have believed photos since Stalin had Yezhov airbrushed out of them.
It isn't just about propaganda photos, it is about -litearlly everything-, even things people have no incentive to fake, like cat videos, or someone doing a backflip or a video of a sunset.
Donald Trump is the president of the United States.
And this is just straight out of Putin's playbook, if everything is fake then people just stop beliving in the concept of truth altogether.
Got pretty wild w/the Iranian propaganda that reportedly _resonated with Americans_ (didn't verify that claim)
Slopaganda - https://www.newyorker.com/culture/infinite-scroll/the-team-b...
The advent of digital systems harmed artists with developed manual artistic skills.
The availability of cheap paper harmed paper mills hand-crafting paper.
The creation of paper harmed papyrus craftsmen.
The invention of papyrus really probably pissed off those who scraped the hair off thin leather to create vellum.
My point is that in line with Jevon's paradox there is always a wave of destruction that occurs with technological transformation, but we almost always end up with more jobs created by the technology in the middle and long term.
Maybe image generators can be a loophole for consent legally, but it seems even grosser morally.
If you're the only one in the world with an internal combustion engine, the environmental impact doesn't matter at all. When they're as common as they are now, we should start thinking about large-scale effects.
1. Generate 100s or 1000s of low-fidelity candidates, find something that matches your vision, iterate.
2. Hand that generated image off to a human and say, "This is what I'm thinking of, now how do we make it real?"
Important: do not skip the last step.
I'm teaching my 4 year old to read. She likes PAW Patrol, but we've kind of exhausted the simple readers, and she likes novelty. So yesterday I had an LLM create a simple reader at her level with her favorite characters, and then turned each text block into a coloring page for her. We printed it off, she and her younger sister colored it, and we stapled it into her own book.
I could come up with 10 3 word sentences myself of course, but I'm not really able to draw well enough to make a coloring book out of it (in fact she's nearly as good as me), and it also helps me think about a grander idea to turn this into something a little more powerful that can track progress (e.g. which phonemes or sight words are mastered and which to introduce/focus on) and automatically generate things in a more principled way, add my kids into the stories with illustrations that look like them, etc.
Models will obviously become the foundation of personalized education in the future, and in that context, of course pictures (and video) will be necessary!
AI aside, if you’ve truly exhausted all the simple readers, maybe she should move on to more advanced books instead of repeating more of the same and gamifying it, which seems a great way to destroy a child’s natural curiosity.
You overestimate how many there are. There's like 10 stories at that level. I do also read ones with paragraphs to her, but she can't do those herself because she's 4.
Diagrams and maps. So much text-based communication begs for a diagram or a map.
For example, take a picture of your garden. Ask chatgpt to give you ideas how to improve it and a step by visual guide.
Anything that can be expressed visually is effectively target for this technology - this covers pretty much everything.
- package design
- pictures for manuals and guides
- navigation and signs
- booklets, tickets and flyers
- logos of all sorts
- websites
- illustrations for books
And many. many others. Not every image is art and very few illustrators are artists.
It's not a particularly compelling argument.
It's a true state-change, which makes the argument pretty compelling IMO.
I'm already imagining this is how the local live indie band night I sometimes go to will generate poster images each week for the bands that are playing, whether to put up at the venue or post to social media. And the bands might be using it to design images to put on their t-shirts and other merch. I already know some indie bands using this stuff for their album covers.
Now of course I'm being dramatically absolute. I'm sure I already consume these things without knowing it. These things serve a function. Offloading to AI is the implementer admitting they can't be bothered to care whether it serves the function.
Short kings on tinder no more!
/s
While the image looks nice, the actual details are always wrong, such as showing pawns in wrong locations, missing pawns, .. etc.
Try it yourself with this prompt: Create a poster to show opening game for Queen's Gambit to teach kids to play chess.
I know this is probably mega cherry-picked to look more impressive, but some of the images are terrifyingly realistic. They seem to have put a lot of effort into the lighting.
From the system card someone linked elsewhere in the discussion
At least they aren't pretending that a solution exists.
Anyways I think approaching the problem from both directions is probably good.
Seeing is not believing anymore, and I don't think SynthID or anything like it can restore that trust in images.
Some politician will be recorded doing something & he'll have his people release a thousand photos/videos of him doing crimes. And they'll say, look, it's a smear campaign.
This is just one stupid example, but people will have better schemes.
Also global coordinated releases of fake content and hypertargeted possibly abusive content. Virtual kidnappings will take off, automated & scaled.
And his enemies will do the same, hopefully resulting in less blind trust for everyone in the population, which can only be a good thing.
https://generative-ai.review/2026/04/rush-openai-gpt-image-2...
I've done a series over all the OpenAI models.
gpt-image-2 has a lot more action, especially in the Apple Cart images.
Noticed it earlier while updating my playground to support it
Overall, quite impressed with its continuity and agentic (i.e. research) features.
It has an unprecedented ability to generate the real thing (for example, a working barcode for a real book)
Consistency? So it fails less often?
Based on the released images, (especially the one "screenshot" of the Mac desktop) I feel like the best images from this model are so visually flawless that the only way to tell they're fake is by reasoning about the content of the image itself (ex. "Apple never made a red iPhone 15, so this image is probably fake" or "Costco prices never end in .96 so this image is probably fake")
Especially when it comes to detailed outputs or non-standard prompts.
I do believe it will get even better - not sure it will happen within a year but I wouldn't be incredibly surprised if it did.
I experimented with the concept of procedural generation of Waldo-style scavenger images with Flux models with rather disappointing results. (unsurprisingly).
If you asked me what I expected, since this one has "thinking", it'd be that it would've thought to do something like generate the image without Waldo first, then insert Waldo somewhere into that image as an "edit"
It doesn't reliably give you 10 slices, even if you ask it to number them. None of the frontier models seem to be able to get this right
That's because you're focusing a little bit too much on visual fidelity. It's still relatively trivial to create a moderately complex prompt and have it fail miserably.
Even SOTA models only scored a 12 out of 15 on my benchmarks, and that was without me deliberately trying to "flex" to break the model.
Here's one I just came up with:
A Mercator projection of earth where the land/oceans are inverted. (aka land = ocean, and oceans = land)Was surprised to see it be able to render a decent comic illustrating an unemployed Pac-Man forced to find work as a glorified pie chart in a boardroom of ghosts.
Is anyone doing this already who can share information on what the best models are?
After 2008 and 2020 vast (10s of trillions) amounts of money has been printed (reasonably) by western gov and not eliminated from the money supply. So there are vast sums swilling about - and funding things like using massively Computationally intensive work to help me pick a recipie for tonight.
Google and Facebook had online advertising sewn up - but AI is waaay better at answering my queries. So OpenAI wants some of that - but the cost per query must be orders of magnitude larger
So charge me, or my advertisers the correct amount. Charge me the right amount to design my logo or print an amusing cat photo.
Charge me the right cost for the AI slop on YouTube
Charge the right amount - and watch as people just realise it ain’t worth it 95% of the time.
Great technology - but price matters in an economy.
API Pricing is mostly unchanged from gpt-image-1.5, the output price is slightly lower: https://developers.openai.com/api/docs/pricing
...buuuuuuuuut the price per image has changed. For a high quality image generation the 1024x1024 price has increased? That doesn't make sense that a 1024x1024 is cheaper than a 1024x1536, so assuming a typo: https://developers.openai.com/api/docs/guides/image-generati...
The submitted page is annoyingly uninformative, but from the livestream it proports the same exact features as Gemini's Nano Banana Pro. I'll run it through my tests once I figure out how to access it.
I think you meant more expensive, right? Because it would make sense for it to be cheaper as there are less pixels.
AI generated voice over, likely AI generated script (You see, this model isn't just generating images, it's thinking!). From what it looks like only the editing has some human touch to it?
It does this Apple style announcement which everyone is doing, but through the use of AI, at least for me, it falls right into the uncanny valley.
Yeah, agree. I think it's the first time I'm asking myself: Ok, so this new cool tech, what is it good for? Like, in terms of art, it's discarded (art is about humans), in terms of assets: sure, but people is getting tired of AI-generated images (and even if we cannot tell if an image is AI-generated, we can know if companies are using AI to generate images in general, so the appealing is decreasing). Ads? C'mon that's depressing.
What else? In general, I think people are starting to realize that things generated without effort are not worth spending time with (e.g., no one is going to read your 30-pages draft generated by AI; no one is going to review your 500 files changes PR generated by AI; no one is going to be impressed by the images you generate by AI; same goes for music and everything). I think we are gonna see a Renaissance of "human-generated" sooner rather than later. I see it already at work (colleagues writing in slack "I swear the next message is not AI generated" and the like)
I feel like this is something people in the industry should be thinking about a lot, all the time. Too many social ills today are downstream of the 2000s culture of mainstream absolute technoöptimism.
Vide. Kranzberg's first law--“Technology is neither good nor bad; nor is it neutral.”
1: Though personally I hate it, I just cannot not read those as completely different vowels (in particular ï → [i:] or the ee in need; ë → [je:] or the first e here; and ö → [ø] or the e in her)
https://www.arrantpedantry.com/2020/03/24/umlauts-diaereses-...
For icons in particular, this opens up a completely new way of customizing my home screen and shortcuts.
Not necessary for the survival of society, maybe, but I enjoy this new capability.
What a rotten exchange.
AI can probably fool most court judges now. Or the defense can refute legitimate evidence by saying “it’s AI / false”. How would that be refuted?
You might generate an AI video of me committing a crime, But the CCTV on the street didn't show it happening and my phone cell tower logs show I was at home. For the legal system I don't think this is going to be the biggest problem. It's going to be social media that is hit hardest when a fake video can go viral far faster than fact checking can keep up.
So that makes AI a "dual good", like a kitchen knife: you can cut your tomato or kill you neighbor with it, entirely up to the "user". Not all users are good, so we'll see an intense amplification of both good and bad.
I put in one of the driest descriptions of the Holocaust I could find and it got a very high score for bias, calling a factual description of a massacre emotional sensationalism because it inevitably contains a lot of loaded words.
It also doesn't differentiate between reporting, commentary, poetry, or anything else. It takes text and spits out a number, which is a very shallow analysis.
They're adrift, every new "fact" (whether true or false) blows them in a new direction. Often they get led in terrible directions from statements that are entirely true (but missing important context).
A lot of financial cons work that way, a long string of true statements that seem to lead to a particular conclusion. I know that if someone is offering me 20% APY there will usually be some risk or fee that offsets those market-beating gains (it may be a worthwhile risk or a well earned fee, but that number needs to trigger further investigation).
We need people to be equipped with that sort of framework in as many areas as possible, but we seem to be moving backwards in that area.
For the nth time: scale, easiness, and access, matter. AI puts propaganda abilities far beyond the reach of those men in the hands of many more people. Do you not understand the difference between one man with a revolver and an army with machine guns? They are not the same.
Nowhere in my comment am I “blaming the tools”. I’ll ask you engage with the argument honestly instead of simply parroting what you already believe absent reading.
A modern laptop is running almost fanless, like a 486 from the days of yore.
A single H200 pumps out 700W continuously in a data center, and you run thousands of them.
Also, don't forget the training and fine tuning runs required for the models.
Mass transportation / global logistics can be very efficient and cheap.
Before the pandemic, it was cheaper to import fresh tomatoes from half-world away rather than growing them locally in some cases. A single container of painting supplies is nothing in the grand scheme of things, esp. when compared with what data centers are consuming and emitting.
No, in terms of unit economics, I'm almost certain that the painting supplies have a bigger ecological/resource footprint than an LLM per icon generated, and I'm pretty sure the cost of shipping tomatoes does not decrease that footprint, even if it possibly dwarfs it.
But yes, due to Jevon's paradox, the total resource use might well increase despite all that. I, for example, would have never commissioned a professional icon for my silly little iOS shortcuts on my homescreen, so my silly icon related carbon footprint went from exactly zero to slightly above that.
Many people think that when a piece of hardware is idle, its power consumption becomes irrelevant, and that's true for home appliances and personal computers.
However, the picture is pretty different for datacenter hardware.
Looking now, an idle V100 (I don't have an idle H200 at hand) uses 40 watts, at minimum. That's more than TDP of many, modern consumer laptops and systems. A MacBook Air uses 35W power supply to charge itself, and it charges pretty quickly even if it's under relatively high stress.
I want to clarify some more things. A modern GPU server houses 4-8 high end GPUs. This means 3KW to 5KW of maximum energy consumption per server. A single rack goes well around 75KW-100KW, and you house hundreds of these racks. So, we're talking about megawatts of energy consumption. CERN's main power line on the Swiss side had a capacity around 10MW, to put things in perspective.
Let's assume an H200 uses 60W energy when it's idle. This means ~500W of wasted energy per server for sitting around. If a complete rack is idle, it's 10KW. So you're wasting energy consumption of 3-5 houses just by sitting and doing nothing.
This computation only thinks about the GPU. Server hardware also adds around 40% to these numbers. Go figure. This is wasting a lot for cat pictures.
And, these "small" numbers add up to a lot.
A: GPUs use a lot of power!
B: Not all of them are running 100% continuously, eh?,
A: They waste too much power when they're idle, too!
C: None of the H200s are sitting idle, you knob!
I mean, they are either wasting energy sitting idle or doing barely useful work. I don't know what to say anymore.We'll cook ourselves, anyway. Why bother? Enjoy the sauna. ¯\_(ツ)_/¯
> they are either wasting energy sitting idle or doing barely useful work
Now here's a true (inverse) scotsman, or more accurately, a moved goalpost: Work on things you don't deem valuable is basically the same thing as idling?
> We'll cook ourselves, anyway. Why bother? Enjoy the sauna. ¯\_(ツ)_/¯
I'm very concerned about that too, but I don't think we'll avoid the sauna with fatalism or logically unsound appeals to morality about resource consumption.
so if power were plentiful and environmental you'd be onboard with it?
Please see my other comment about energy consumption and connect the dots with how open loop DLC systems are harmful to fresh water supplies (which is another comment of mine).
> so if power were plentiful and environmental you'd be onboard with it?
This is a pretty loaded way to ask this. Let me put this straight. I'm not against AI. I'm against how this thing is built. Namely:
- Use of copyrighted and copylefted materials to train models and hiding under "fair use" to exploit people.
- Moreover, belittling of people who create things with their blood sweat and tears and poorly imitating their art just for kicks or quick bucks.
- Playing fast and loose with environment and energy consumption without trying to make things efficiently and sustainably to reduce initial costs and time to market.
- Gaslighting the users and general community about how these things are built, and how it's a theater, again to make people use this and offload their thinking, atrophying their skills and making them dependent on these.
I work in HPC. I support AI workloads and projects, but the projects we tackle have real benefits, like ecosystem monitoring, long term climate science, water level warning and prediction systems, etc. which have real tangible benefits for the future of the humanity. Moreover, there are other projects trying to minimize environmental impact of computation which we're part of.So it's pretty nuanced, and the AI iceberg goes well below OpenAI/Anthropic/Mistral trio.
As opposed to the illusory/fake/immoral benefits of using LLMs for entertainment purposes (leaving aside all other applications for now)?
How do you feel about Hollywood, or even your local theater production? I bet the environmental unit economics don't look great on those either, yet I wouldn't be so quick to pass moral judgement.
Why not just focus on the environmental impact instead of moralizing about the utility? It seems hard to impossible to get consensus there, and the impact should be able to speak for itself if it's concerning.
I'm not really well versed on the environmental cost, more just (neutrally) pointing out that comparing a single 10s image to a 5-6 hour commission ignores the fact that the majority of these images probably would never have existed in the first place without AI.
A mid-tier top-500 system (think about #250-#325) consumes about a 0.75MW of energy. AI data centers consume magnitudes more. To cool that behemoth you need to pump tons of water per minute in the inner loop.
Outer loop might be slower, but it's a lot of heated water at the end of the day.
To prevent water wastage, you can go closed loop (for both inner and outer loops), but you can't escape the heat you generate and pump to the atmosphere.
So, the environmental cost is overblown, as in Chernobyl or fallout from a nuclear bomb is overblown.
So, it's not.
The cost to humans living in affected areas was massive and high profile, but it’s very questionable if it was higher than that of an equivalent amount of coal-burning plants. Fortunately not a tradeoff we have to debate anymore, since there are renewables with much fewer downsides and externalities still.
Nuclear bombs (at least those being actually used) by design kill people, so I’m not sure what the externalities even are if the main utility is already to intentionally cause harm.
As a country, we use 322 billion gallons of water per day. A few million gallons for a datacenter is nothing.
The water gets contaminated and heated, making it unsuitable for organisms to live in, or to be processed and used again.
In short, when you pump back that water to the river, you're both poisoning and cooking the river at the same time, destroying the ecosystem at the same time too.
Talk about multi-threaded destruction.
Pipes rust, you can't stop that. That rust seeps to the water. That's inevitable. Moreover, if moss or other stuff starts to take over your pipes, you may need to inject chemicals to your outer loop to clean them.
Inner loops already use biocides and other chemicals to keep them clean.
Look how nuclear power plants fight with organism contamination in their outer cooling loops where they circulate lake/river water.
Same thing.
If you see no difference between them, I can't continue to discuss this with you, sorry.
And I say that as somebody that also finds Ghibli knock-off avatars used by AI bros in incredibly bad taste (or, arguably an even worse crime against taste, a dated 2025 vibe).
I like your discussion style.
I don't want to live in a world in which people get to decide what others can and can't do with their share of resources (after properly accounting for all externalities, including pollution, the potential future value of non-renewable present resources etc. – this is where today's reality often and massively misses that ideal) based on their subjective moral criteria.
Not even just for ethical/moral reasons, but also for practical ones: It’s infinitely harder to get everybody to additionally agree on value of use than on fairness of allocation alone.
After thoroughly mixing these two quite distinct concerns, you'll also have a very hard time convincing me that your concerns for river pollution etc. (which I take very seriously as potentially unaccounted negative externalities, if they exist) are completely free from motivated reasoning about "immoral usage".
My design rules were: No gradients; no purple; prefer muted colors; plenty of sharp corners and overlapping shapes; Use the Boba Milky font face;
The difference is very stark:
- The AI has a hard time making the geometric shapes regular. You see the stars have different size arms at different intervals in the AI version. This will take a human artist longer time to make it look worse.
- The 5-point stars are still a little rounded in the AI version.
- There is way too much text in the AI version (a human designer might make that mistake, but it is very typical of AI).
- The orange 10 point star in the right with the text “you are the star” still has a gradient (AI really can’t help it self).
- The borders around the title text “Karaoke night!” bleed into the borders of the orange (gradient) 10-point star on the right, but only half way. This is very sloppy, a human designer would fix that.
- The font face is not Milky Boba but some sort of an AI hybrid of Milky Boba, Boba Milky and comic sans.
- And finally, the QR code has obvious AI artifacts in them.
Point I’m making, it is very hard to prompt your way out of making a poster look like AI, especially when the design is intentional in making it not look like AI.
But they are very different certainly. ChatGPT generated a poster with a very sleek, “produced” style that apes corporate posters whereas you went with a much more personal touch. You are correct that yours does not look like typical AI.
My point is certainly not that the AI poster is better, only that it’s capable of producing surprising results. With minimal guidance it can also generate different styles: https://imgur.com/a/zXfOZaf
I think the trend to intentionally make stuff look “non-AI” is doomed to fail as AI gets better and better. A year or two ago the poster would have been full of nonsense letters.
> And finally, the QR code has obvious AI artifacts in them.
I wonder if this is intentional, to prevent AI from regurgitating someone’s real QR codes.
ETA: Actually, I wonder how much of the “flair” on human-drawn stars is to avoid looking like they are drag-and-drop from a program like Word. Ironic if we’ve circled back around to stars that look perfect to avoid looking like a different computer generated star.
What’s the mechanism that makes an AI ‘better’ at looking non-AI? Training on non-ai trend images? It’s not following prompts more closely. Even if that image had no gradients or pointier shapes, it still doesn’t look like it was made by an individual.
To your counterpoints, notice that you are apologizing for the AI by finding humans that may have done something, sometime, that the AI just did. Of course! It’s trained on their art. To be non-AI, art needs to counter all averages and trends that the models are trained on.
I don’t know. Better training data? More training data? The difference over the past year or two is stark so something is improving it.
> Even if that image had no gradients or pointier shapes, it still doesn’t look like it was made by an individual.
The fact that humans are actively trying to make art that does not look like AI makes it clear that AI is not so obvious as many would like to pretend. If it were obvious, no one would need to try to avoid their art looking like AI.
> To your counterpoints, notice that you are apologizing for the AI by finding humans that may have done something, sometime, that the AI just did. Of course! It’s trained on their art.
Obviously.
> To be non-AI, art needs to counter all averages and trends that the models are trained on.
So in order to not look like AI, art just has to be so unique that it’s unlike any training data. That’s a high bar. Tough time to be an artist.
About the stars. I know designers paint unperfect stars. I even did that in my design. In particular I stretched it and rotated slightly. A more ambitious designer might go further and drag a couple of vertices around to exaggerate them relative to the others. But usually there is some balance in their decisions. AI however just puts the vertices wherever, and it is ugly and unbalanced. A regular geometric shape with a couple of oddities is a normal design choice, but a geometric shape which is all oddities is a lot of work for an ugly design. Humans tend not do to that.
I don’t think this is a productive choice, but it’s certainly yours to make.
> but a geometric shape which is all oddities is a lot of work for an ugly design. Humans tend not do to that
I find this such an odd thing to say. It’s way easier to draw a wonky star than a symmetrical one. Unless “drawing” here means using a mouse to drag and drop a star that a program draws for you.
Vintage illustrations are full of nonsymmetrical shapes. The classic Batman “POW” and similar were hand drawn and rarely close to symmetrical.
Apart from me, my partner also does graphic design, and unlike me she values her sanity more then open source so she uses illustrator for her designs. In adobe’s walled garden world of proprietary software it is still the same story, you generally use the specific tools to get regular shapes (or patterns) and then alter them after the they are drawn. You don‘t draw them from scratch. If you are familiar with modular analog synthesizers, this is starting with a square wave, and then subtracting to modulate the signal into a more natural sounding form.
At small scales what "art" does your business need? If you can't afford to hire an artist (which is completely fine, I couldn't for my business!) do you really need the art or are you trying to make your "brand" look more polished than it actually is? Leverage your small scale while you can because there isn't as much of an expectation for polish.
And no, a band poster doesn't have to be a labor of love. But it also doesn't have to be some big showy art either. If I saw a small band with a clearly AI generated poster it would make me question the sources for their music as well.
Very few bands would agree with that statement.
1) it's made from copyrighted works, and the original authors receive no credit; 2) it is (typically) low-effort; 3) there are numerous negative environmental effects of the AI industry in general; 4) there are numerous negative social effects of AI in general, and more specifically AI generated imagery is used a lot for spreading misinformation; 5) there are numerous negative economic effects of AI, and specifically with art, it means real human artists are being replaced by AI slop, which is of significantly lower quality than the equivalent human output. Also, instead of supporting multiple different artists, you're siphoning your money to a few billion dollar companies (this is terrible for the economy)
As a side note, if you have a business which truly cannot afford to pay any artists, there are a lot of cheaper, (sometimes free!) pre-paid art bundles that are much less morally dubious than AI. Plus, then you're not siphoning all of your cash to tech oligarchs.
People are saying, very clearly, that they're not willing to put effort into something produced by someone who put no effort in.
<joke>What's your rock band called, "SEC Form 10-K"?</joke>
I know this is controversial in tech spaces. But most people, particularly those in art spaces like music actually appreciate creativity, taste, effort, and personal connection. Not just ruthless efficiency creating a poster for the lowest cost and fastest time possible.
If your business can't afford to spend $5 on Fivr, it's not a business. It's not even panhandling.
Your quip is pithy but meaningless.
I could have generated my own content, so just send the prompt rather than the output to save everyone time.
Again - your quip sounds good but when you think about it, it's flatly wrong.
There is a mass, bland appeal to “better” things but it’s not ubiquitously desired and there will always be people looking outside of that purely because “better” is entirely subjective and means nothing at all.
Is an AI generated photo of your app/site going to be more accurate than a screenshot? Or is an AI generated image of your product going to convey the quality of it more than a photo would?
I think Sora also showed that the novelty of generating just "content" is pretty fleeting.
I would be interested to see if any of the next round of ChatGPT advertisements use AI generated images. Because if not, they don’t even believe in their own product.
Edit: One of the possible outcomes may be living in a world like in "Them" with glasses on. Since no expression has any meaning anymore, the message is just there being a signal of some kind. (Generic "BUY" + associated brand name in small print, etc.)
I'm not sure you immediately lose meaning if someone can make a highly personalized version of something easily. The % of completely meaningless video after YouTube and tiktok came about has skyrocketed. The amount of good stuff to watch has gone up as well though.
But so many people want to make art, and it's so cheap to distribute it, that art is already commoditized. If people prefer human-created art, satisfying that preference is practically free.
But the idea of novelty is a misnomer I think. Any random number generator can arbitrarily create a "novel" output that a human has never seen before. The issue is whether something is both novel and useful, which is hard for even humans to do consistently.
I’m so tired of “there’s nothing preventing”, and “humans do that too”. Modern AI is just not there. It’s not like humans and has difficulties with adapting to novelty.
Whether transformers can overcome that remains to be seen, but it is not a guarantee. We’ve been dealing with these same issues for decades and AI still struggles with them.
What? Those items are luxuries when made by humans because they are physical goods where every single item comes with a production and distribution cost.
I just recently used for image generation to design my balcony.
It was a great way to see design ideas imagined in place and decide what to do.
There are many cases people would hire an artist to illustrate an idea or early prototype. AI generated images make that something you can do by yourself or 10x faster than a few years ago.
Not withstanding a few code violations, it generated some good ideas we were then able to tweak. The main thing was we had no idea of what we wanted to do, but seeing a lot of possibilities overlaid over the existing non-garden got us going. We were then able to extend the theme to other parts of the yard.
Also, this can’t be real. How many publications did they train this stuff on and why are there no acknowledgment even if to say - we partnered with xyz manga house to make our model smarter at manga? Like what’s wrong with this company?
There is nothing that cannot harm. Knives, cars, alcohol, drugs. A society needs to balance risks and benefits. Word can be used to do harm, email, anything - it depends on intention and its type.
I started being totally indifferent after thinking about my spending habits to check for unnecessary stuff after watching world championships for niche sports. For some this is a calling for others waste. It is a numbers game then.
Is that true? Don't think I'd get tired of images that are as good as human made ones just because I know/suspect there may have been AI involved
Visual explanations are useful, but most people don't have the talent and/or the time to produce them.
This new model (and Nano Banana Pro before it) has tipped across the quality boundary where it actually can produce a visual explanation that moves beyond space-filling slop and helps people understand a concept.
I've never used an AI-generated image in a presentation or document before, but I'm teetering on the edge of considering it now provided it genuinely elevates the material and helps explain a concept that otherwise wouldn't be clear.
- The usual advantages of vector graphics: resolution-independence, zoom without jagged edges, etc.
- As a consequence of the above, vector graphics (particularly SVG) can more easily be converted to useful tactile graphics for blind people.
- Vector graphics can more practically be edited.
I think what we'll find is that visual design is no longer as much of a moat for expressing concepts, branding, etc. In a way, AI-generated design opens the door for more competition on merits, not just those who can afford the top tier design firm.
I used to have an assistant make little index-card sized agendas for gettogethers when folks were in town or I was organising a holiday or offsite. They used to be physical; now it's a cute thing I can text around so everyone knows when they should be up by (and by when, if they've slept in, they can go back to bed). AI has been good at making these. They don't need to be works of art, just cute and silly and maybe embedded with an inside joke.
If this is the best use case that exists for AI image generation, I'm only further convinced the tech is at best largely useless.
Because I’ll then spend hours playing with the typography (because it’s fun) and making it look like whatever design style I’ve most recently read about (again, because it’s fun) and then fighting Word or Latex because I don’t actually know what I’m doing (less fun). Outsourcing it is the right move, particularly if someone else is handling requests for schedules to be adjusted. An AI handles that outsourcing quicker for low-value (but frequent) tasks.
> If this is the best use case that exists for AI image generation
I’ve also had good luck sketching a map or diagram and then having the AI turn it into something that looks clean.
Look, 99% of my use cases are e.g. making my cat gnaw on the Tetons or making a concert of lobsters watching Lady Gaga singing “I do it for the claws” or whatever so I can send two friends something stupid at 1AM. But there does appear to be a veneer of productivity there, and worst case it makes the world look a bit nicer.
It's good that my friends don't make a coffee date feel like a board meeting (with an agenda shared by post 14 working days ahead of the meeting, form for proxy voting attached).
If I got one of your cute schedule cards while visiting you, I'd tear it up, check into a cheap motel, and spend the rest of my vacation actually enjoying myself.
Edit: I'm not an outlier here. There have even been sitcom episodes about overbearing hosts over-programming their guests' visits, going back at least to the Brady Bunch.
Okay. I'd be confused why you didn't voice up while we were planning everything as a group, but those people absolutely exist. (Unless it's someone's, read: a best friend or my partner's, birthday. Then I'm a dictator and nobody gets a choice over or preview of anything.)
I like to have a group activity planned on most days. If we're going to drive to get in an afternoon hike in before a dinner reservation (and if I have 6+ people in town, I need a dinner reservation because no I'm not coooking every single evening), or if I've paid for a snowmobile tour or a friend is bringing out their telescope for stargazing, there are hard no-later-than departure times to either not miss the activity or be respectful of others' time.
My family used to resolve that by constantly reminding everyone the day before and morning of, followed by constantly shouting at each other in the hours and minutes preceding and–inevitably–through that deadline. I prefer the way I've found. If someone wants to fuck off from an activity, myself included, that's also perfectly fine.
(I also grew up in a family that overplanned vacations. And I've since recovered from the rebound instinct, which involves not planning anything and leaving everything to serendipity. It works gorgeously, sometimes. But a lot of other times I wonder why I didn't bother googling the cool festival one town over before hand, or regretted sleeping in through a parade.)
> There have even been sitcom episodes about overbearing hosts over-programming their guests' visits
Sure. And different groups have different strokes. When it comes to my friends and I, generally speaking, a scheduled activity every other day with dinners planned in advance (they all get hangry, every single fucking one of them) works best.
I get this sounds elitist - but tremendous percentage of population is happily and eagerly engaging with fake religious images, funny AI videos, horrible AI memes, etc. Trying to mention that this video of puppy is completely AI generated results in vicious defense and mansplaining of why this video is totally real (I love it when video has e.g. Sora watermarks... This does not stop the defenders).
I agree with you that human connection and artist intent is what I'm looking for in art, music, video games, etc... But gawd, lowest common denominator is and always has been SO much lower than we want to admit to ourselves.
Very few people want thoughtful analysis that contradicts their world view, very few people care about privacy or rights or future or using the right tool, very few people are interested in moral frameworks or ethical philosophy, and very few people care about real and verifiable human connection in their "content" :-/
It's been true for various technologies that HN (and tech audiences in general) have a more nuanced view, but AI flips the script on that entirely. It's the tech world who are amazed by this, producing and being delighted by endless blogposts and 7-second concept trailers.
If a work of art is good, then it's good. It doesn't matter if it came from a human, a neanderthal, AI, or monkeys randomly typing.
When I watch a Lynch film I feel some connection to the man David Lynch. When I see a AI artwork, there is nothing to connect with, no emotional experience is being communicated, it is just empty. It's highest aspiration is elevator music, just being something vaguely stimulating in the background.
I dont think gamers hate AI, it is just a vocal miniority imo. What most people dislike is sloppy work, as they should, but that can happen with or without AI. The industry has been using AI for textures, voices and more for over a decade.
It’s really not. That's actually a pet peeve of mine as someone who used to spent a lot of time messing with pixel art in Aseprite.
Nobody takes the time to understand that the style of pixel art is not the same thing as actual pixel art. So you end up with these high-definition, high-resolution images that people try to pass off as pixel art, but if you zoom in even a tiny bit, you see all this terrible fringing and fraying.
That happens because the palette is way outside the bounds of what pixel art should use, where proper pixel art is generally limited to maybe 8 to 32 colors, usually.
There are plenty of ways to post-process generative images to make them look more like real pixel art (square grid alignment, palette reduction, etc.), but it does require a bit more manual finesse [1], and unfortunately most people just can’t be bothered.
You'd think these kickbacks leaders of these towns are getting for allowing data centers to be built would go towards improving infrastructure but hah, that's unrealistic.
WTF is that unrealistic? SMH
Do you have any references for such cases? I have seen talk of such thing at risk, but I am unaware of any specific instances of it occuring
The article tries to play sleight of hand with the specific instance that they cite but it seems that the loss of water is alleged to be caused by sediment from construction rather than water use.
It's not great that it happened and it is something local government should take action on, but it is also something that could have been caused by any form of industrial construction. I suspect there are already laws in place that cover this. If they are not being enforced that's another issue entirely.
I dunno how long this is going to hold up. In 50 years, when OpenAI has long become a memory, post-bubble burst, and a half-century of bitrot has claimed much of what was generated in this era, how valuable do you think an AI image file from 2023 - with provenance - might be, as an emblem and artifact of our current cultural moment, of those first few years when a human could tell a computer, "Hey, make this," and it did? And many of the early tools are gone; you can't use them anymore.
Consider: there will never be another DallE-2 image generation. Ever.
That's it. I can't think of a single actual use case outside of this that isn't deliberately manipulative and harmful.
Agreed mostly, BUT
I'm building tools for myself. The end goal isn't the intermediate tool, they're enabling other things. I have a suspicion that I could sell the tools, I don't particularly want to. There's a gap between "does everything I want it to" and "polished enough to justify sale", and that gap doesn't excite me.
They're definitely not generated without effort... but they are generated with 1% of the human effort they would require.
I feel very much empowered by AI to do the things I've always wanted to do. (when I mention this there's always someone who comes out effectively calling me delusional for being satisfied with something built with LLMs)
As for advertising being depressing - its a little late to get up on the high horse of anti-Ads for tech after 2 decades of ad based technology dominating everything. Go outside, see all those bright shiny glittery lights, those aren't society created images to embolden the spirit and dazzle the senses, those are ads.
North Korea looks weird and depressing because the don't have ads. Welcome to the west.
As with anything AI, we are not ready for the scale of impact. And for what? Like, why are you proud of this?
Maybe it's meant to convey pace & hype
But the broader concept of fake news and the manufactured nature of media and rhetoric is much more relevant - e.g. whether or not something's AI is almost immaterial to the fact that any filmed segment does not have to be real or attributed to the correct context.
Its an old internet classic just to grab an image and put a different caption on it, relying on the fact no one can discern context or has time to fact check.
> Wow, the difference between AI and non-AI images collapses. I hate the future where I won't be able to tell the difference.
Image generation is now pretty much "solved". Video will be next. Perhaps things will turn out the same as chess: in that even though chess was "solved" by IBM's Deep Blue, we still value humans playing chess. We value "hand made" items (clothes, furniture) over the factory made stuff. We appreciate & value human effort more than machines. Do you prefer a hand-written birthday card or an email?
Feels like now is a bit of a catchup after pretty tepid period that was most of my life.
Photographs, videos, and digital media in general, in contrast, are used for much, much more than just socializing.
I would imagine this will hit illustrators / graphics designers / similar people very hard, now that anyone can just generate professional looking graphical content for pennies on the dollar.
I don't think it'll fail like Sora though. gpt-image-1.5 didn't fail.
but in general though - will people believe in anything photographic ?
imagine dating apps, photographic evidence.
I'm guessing we're gonna reach a point where - you fuck up things purposely to leave a human mark.
Storefronts like Steam require disclosing use of AI assets for art. In most indie dev spaces, devs are scolded for using AI art in their games. I wonder if this perspective will change in a few years.
Hopefully film makes a come back.
It's just another step into hell.
The person you're replying to is making a joke about OpenAI shutting down Sora their video generation "social media" app recently.
Never before in history did humanity have the possibility of seeing a picture of a pack of wolves! The dearth of photographs has finally been addressed!
I told my AI girlfriend that I will save money to have access to this new technology. She suggested a circular scheme where OpenAI will pay me $10,000 per year to have access to this rare resource of 21th century daguerreotype.
https://www.gally.net/temp/20260422-chatgpt-images-2-example...
Later Google tried the same thing, Apple we will give you a $1 billion dollar a year refund, what’s changed in two and a half years?