For a litmus test of your perspective, try using sora. Try to make a video that makes someone genuinely laugh. Sora doesn't prompt itself. Human creativity and humor is still required.
Sure, it was moderated to heck, like all models attempting to avoid PR disasters (see Grok), but, just as with Youtube and broadcast TV, there's still a corporate friendly surface area that excludes porn, gore, etc, that people can enjoy. And yes, people like different things.
Like, imagine if you watched a bunch of GenAI videos of cars sliding on ice from the driver’s perspective. The physics is wrong, and surely it’s going to make you a worse driver because you are feeding your internal prediction engine incorrect training data. It’s less likely that you’ll make the right prediction in real life when it counts.
But I think I do have similar feelings about special effects. A difference is that special effects tend to depict scenarios very outside of the envelope of normal experience, so probably not very damaging if my model of “what does a plane crash look like” is screwed up.
Though some effects probably are damaging - how many people subconsciously assume cars explode when they are in an accident? A poor mental model of the odds of a car exploding could cause you to make poor real-life decisions (like moving someone out of a wrecked car in a panic instead of waiting for EMS, risking spine/neck injury)
Your counter-examples have the property that most of the things you need to learn are absent from the media being watched, leading to an observation which is "obviously" true, but they ignore the impact of media on a journey properly incorporating other pieces of information. To compare to the mental models being discussed, you'd have to actually consider effects you're writing off as negligible, and when it comes to something like a world model which we've only learned by observation and which doesn't have a lot of additional specialized knowledge those effects might be much more impactful.
Most people can’t explain the physics they see, but they can deduce enough to be able to predict the effects of physical actions most of the time.
Sure, be ready to get them out, and if they’re trapped and it’s going to be a while until fire shows up start working on that. But my mental model is that for any road legal car that is not currently on fire, there is a higher chance you’ll cause harm by rashly moving a victim than that a victim will be suddenly consumed by an enormous Hollywood style conflagration.
Films on film using in camera effects are still made on occasion but they’re art films for niche audiences.
But we’ll never get another Ben Hur. And that doesn’t sit well with me even if society can’t yet fully explain why.
The worst offenders are brake sounds not correlating to the car movement, engine sounds not correlating to the car's acceleration, nonsensical car deceleration while braking, and steering wheel not correlating to car steering.
I am willing to suspend disbelief for Terminator 1, even if it is clear, that it's a head of the doll in shot.
But it is insulting to feed slop to your audience; it shows you didn't even try.
I have actually seen one slop-video, that I kinda enjoyed - it was obvious, that a great effort was put in a script and details as much as it was obvious it isn't being passed for the real thing.
"AI" consumes energy before user even started (during training).
That is on top of comparison for each particular case.
Model training is similar to the creation of the cgi for the movie. Both happen before anyone consumes the output, and represent the up front cost for the producer.
Both a movie and a language model can cost tens or hundreds of dollars to produce.
In both cases additional infrastructure is needed for efficient usage: movie theaters or streaming platforms for movies, and data centers with the GPUs for LLMs. This is also upfront (capex) costs.
At consumption time, the movie requires some additional resources, per viewing, whether it's a movie theater or streaming. Likewise, an llm consumes some resources at inference time. These are opex. In both cases, the marginal cost for inference/consumption is quite low.
> Model training is similar to the creation of the cgi for the movie. Both happen before anyone consumes the output
I did not say anything about consumption of the output. Maybe you misread what I wrote, it is about energy consumption. > Both a movie and a language model can cost
But we weren't comparing cost of the movie to cost of a language model > can cost tens or hundreds of dollars
But we weren't talking about dollars, we were talking about energy.We're clearly exploring different questions.
CGI renders do use a lot of electricity relative to playing back the movie for individual viewers. It's perfectly analogous.
> CGI renders do use a lot of electricity relative to playing back the movie for individual viewers. It's perfectly analogous.
I've literally laughed at loud after reading this.I can't believe you're stretching this in a good faith.
But if you are - well, you're certainly have a unique perspective.
I am 100% with you. I didn't ever _use_ Sora, but some of it trickled down to me (mostly through Instagram reels). I think it's amazing that we have such great new tools to express ourselves, and that we are trying out new platforms, paradigms, and approaches.
Is there money involved? Absolutely, but I don't fault companies for trying to earn their keep.
It 100% takes work to use these tools in the right way to make something funny. Ask an LLM to make them on their own and they'll hardly evoke laughs (I'm sure that'll change too, though).
Then, when they start ratcheting the slop ratio up (likely under the justification of keeping up with declining creator engagement), the consumers get more and more adjusted to a pure-slop feed, until bingo you have a direct line into the midbrain of millions of consumers/voters/parents/employees/serfs.
The real problem with AI slop is not the AI. It's the people. It's always the people.
The clickbait has started fooling people more than before, with the latest videos being halfway believable (except for the circumstances of the videos).
Technology enables the most malicious and self-interested, and systems need to be adjusted to not reward that, or users need to become wise to it.
With the amount of early 2000's style clickbait ads still around, I'm not sure we ever vanquished Web 1.0 style clickbait, it just got crowded out by ever more sophisticated forms.
The percentage of AI videos over the internet will certainly not decrease after Sora is gone.
The question is when will Chinese coding models have their Seedance moment and squeeze Opus/Codex out of market. It weirdly feels impossible and inevitable at the same time.
It much easier to make Qwen animate tankman than it's to make any western model to generate indigenous people dancing because cough cough naked skin is baaaaad. Except this Musk one that will nonetheless affected by all the copyright mess.
Then it became synonymous with slop, lowest common denominator content made without care, instead of a tool for enabling people willing to put in a varying level of skill, kinds of expertise and effort, like coding models did.
The existence of inoffensive use cases doesn't invalidate anything OP is saying, that's just a natural human reaction to overexposure of a technology.
In the span of less than 2 years, pretty much everywhere I look has been inundated with zero-effort spam, manipulated imagery, etc that has had a net-negative impact on my life. Even if it may also be helpful for a small business making a flyer or whatever without actively making my life worse, that doesn't really move the needle on my overall attitude.
> manipulated imagery
And we thought iPhone camera videos were bad... (they were (and are) though)It’s so dumb that Zuck and Elmo want to inject^H^H^H^H^H^Hrecommend content into these people’s feeds while they’re checking in on their neices and nephews and local book clubs.
- You're making unsubstantiated claim
- personally targeting someone you don't even know
- in order to celebrate presumed success of a mass fraud?
If you want a video of a dancing cat, sure, you can get that. But if you want an orange tabby doing the moonwalk or the robot, that's a lot harder. You'll have to generate dozens of videos and fine tune prompt incantations before you get what you want, if you even do before you hit a rate limit or you get frustrated. If you want something specific and unique and interesting, you still need to put in a lot of effort. Therefore, most videos that people actually make and share are pretty generic.
I think most art models have subtle tells and limitations similar to textual LLMs too, just a little harder to recognize. Certain ideas and imagery will be easier to generate and more likely to fill in the gaps of your prompt. The technology is fascinating compared to the nothing that we had before, but it still has real limitations - try to get it to generate an Italian plumber wearing a red hat that isn't Mario, for example.
All that to say, the trend towards low effort, repetitive, and uncreative results is inherent in the medium. Most users will prompt for a generic dancing cat and get something resembling a cat doing something that resembles a dance and that will flood social media. The few people going for a more creative and specific artistic view will be frustrated by the constant rolling of dice, and if they do make something they work hard on, it will be drowned out by the low effort slop posts. And if you're frustrated by those limitations and want to make something intentional, then you'll eventually gravitate towards Photoshop or Blender where you can actually craft the exact thing you want.
These models do not really "democratize art", they just make it really easy to generate visually interesting noise. Once the novelty wears off, the limitations are apparent. Art has always been democratized anyway - Blender and Krita are free, and pencils are cheap.
Novels, cinema, television, comic books, etc.
They were all considered careless skill-free slop at some point.