upvote
This isn't a solvable problem without world models. Tokenised prompting is like stabbing a pin at a huge target in the dark. Sometimes something interesting falls out, but latent space doesn't have the definition to give most people exactly what they want.

When it does, it's more likely to be something popular and unoriginal, where the data is dense, and less likely to be something inventive and strange.

reply
> This isn't a solvable problem without world models.

I wish we could use something like a simple DSL rather than English prose to work with these models, in order to have some real precision to describe what we want.

reply
Nothing stops that from happening. Just needs to be trained in that DSL. Though at that point it returns to it's original form as a better autocomplete/IntelliSense :).

That will likely happen in the specialized fields. We can already see tools like Figma, Mira, and others that generate functional-ish frontend components in full typescript and corresponding styles (that are also selectable and configurable in the interface). Though, not quite as free, since they do load their base framework and components to ensure consistency and sanity / error-checking, etc., but even then it is in fact generating you useable, modifiable components that you can engage with in precision in your normal DSL.

For video, this likely exists, or is being worked on as we speak. All specialized domain tools will go towards this model to allow those domain experts to use the tools with the precision they expect AND the agentic gains we already take for granted.

reply
If only there was some kind of formalised "language" to, as it were, "programme" the automata but alas such a concept is impossible to conceptualise.
reply