undefined

points

[-]

> each source file is accompanied by a full chat log, including false starts and misunderstandings. It's sort of like reading a git history instead of the actual file.

My Git history contains links between the false starts and misunderstandings and the corrections, which then also include a paragraph on my this was a misunderstanding or false start. It is a lot better than just a single linear log from LLMs.

by 17186274401 days ago|

prev|

[-]

> each source file is accompanied by a full chat log, including false starts and misunderstandings. It's sort of like reading a git history instead of the actual file.

by ajkjk1 days ago|

prev|

[-]

true, but that just means that's the problem to solve. probably the ideal architecture isn't possible right now. But I sorta imagine that you could later on take the full transcript of that conversation and expect any LLM to implement more or less the same thing based on it, so that eventually it becomes a full 'spec'.

And maybe there is a way to trim the parts out of it that are not needed... like to automatically produce an initial prompt which is equivalent to the results of a longer session, but is precise enough so as to not need clarification upon reprocessing it. Something like that? I'm not sure if that's something that already exists.

by sarchertech19 hours ago|

parent|

[-]

> But I sorta imagine that you could later on take the full transcript of that conversation and expect any LLM to implement more or less the same thing based on it

Why would you think this though? There are an infinite number of programs that can satisfy any non-trivial spec.

We have theoretical solutions to LLM non-determinism, we have no theoretical solutions to prompt instability especially when we can’t even measure what correct is.

by ajkjk14 hours ago|

parent|

[-]

yeah but all of the infinite programs are valid if they satisfy the spec (well, within reason). That's kinda the point. Implementation details like how the code is structured or what language it's in are swept under the rug, akin to how today you don't really care what register layout the compiler chooses for some code.

by sarchertech13 hours ago|

parent|

[-]

There has never been a non trivial program in the history of the world that could just “sweep all the implementation details under the rug”.

Compilers use rigorous modeling to guarantee semantic equality and that is only possible because they are translating between formal languages.

A natural language spec can never be precise enough to specify all possible observable behaviors, so your bot swarm trying to satisfy the spec is guaranteed to constantly change observable behaviors.

This gets exposed to users and churn, jank, and workflow breaking bugs.