undefined

points

[-]

Pretty much mirrors my experience using GPT to generate images creatively. I tried to generate an image to accompany a Robert frost poem and it made something... plausibly related. But not what I was describing. I spent the next 90% of the time making it 10% closer to what I wanted but it never got all the way there.

I’ve given it different levels of open-endednes, give this flow chart an aesthetic like this mechanical keyboard, or generate an SVG of this graphic from a 70s slide show, but it never looks quite like what I have in mind.

In the end, I think you only use this stuff to generate images if you’re prepared to accept whatever comes out on approximately the first try.

by TheOtherHobbes21 hours ago|

parent|

[-]

This isn't a solvable problem without world models. Tokenised prompting is like stabbing a pin at a huge target in the dark. Sometimes something interesting falls out, but latent space doesn't have the definition to give most people exactly what they want.

When it does, it's more likely to be something popular and unoriginal, where the data is dense, and less likely to be something inventive and strange.

by xienze21 hours ago|

parent|

[-]

> This isn't a solvable problem without world models.

I wish we could use something like a simple DSL rather than English prose to work with these models, in order to have some real precision to describe what we want.

by asnyder17 hours ago|

parent|

[-]

Nothing stops that from happening. Just needs to be trained in that DSL. Though at that point it returns to it's original form as a better autocomplete/IntelliSense :).

That will likely happen in the specialized fields. We can already see tools like Figma, Mira, and others that generate functional-ish frontend components in full typescript and corresponding styles (that are also selectable and configurable in the interface). Though, not quite as free, since they do load their base framework and components to ensure consistency and sanity / error-checking, etc., but even then it is in fact generating you useable, modifiable components that you can engage with in precision in your normal DSL.

For video, this likely exists, or is being worked on as we speak. All specialized domain tools will go towards this model to allow those domain experts to use the tools with the precision they expect AND the agentic gains we already take for granted.

by Marazan11 hours ago|

parent|

prev|

[-]

If only there was some kind of formalised "language" to, as it were, "programme" the automata but alas such a concept is impossible to conceptualise.

by userbinator19 hours ago|

prev|

[-]

- i probably got 3 good videos out of 100 gens

My experience with AI image generation is similar, although with a higher success rate (depending on how accurate you want the result to be); but indeed, filtering is a major part of the process.

by bananamogul21 hours ago|

prev|

[-]

In my experience, Sora was fantastic for what it did. Light years better than Adobe Firefly. On par with Leonardo.

A lot of YouTube content is really talk, so it was easy to create Sora videos as video content while you talked over them.

However, its failure was that it watermarked everything. WTF? Leonardo didn't do that. Neither did other models. So while video gen was excellent, you always had these ridiculous floating watermarks.