If you take the compile idea seriously, the next step is to give prompts the same structure code has: separate the role from the context from the constraints from the output spec. Then compile that into XML for the model.
I built flompt (https://github.com/Nyrok/flompt) as a tool for this. Canvas where you place typed blocks (role, objective, constraints, output format, etc.) and compile to structured XML. Basically an IDE for prompts, not a text editor. A star would help a lot if this resonates.
Translating from a natural language spec to code involves a truly massive amount of decision making because it’s ambiguous. For a non trivial program, 2 implementations of the same natural language spec will have thousands of observable differences.
Where we are today, that is agents require guardrails to keep from spinning out, there is no way to let agents work on code autonomously or constantly recompile specs that won’t end up with all of those observable differences constantly shifting, resulting in unusable software.
Tests can’t prevent this because for a test suite to cover all observable behavior, it would need to be more complex than the code. In which case, it wouldn’t be any easier for machine or human to understand. The only solution to this problem is that LLMs get better.
Personally I think at the point they can pull this off, they can do any white collar job, and there’s not point in planning for that future because it results in either Mad Max or Star Trek.
other than that you seem to be arguing against someone other than me. I certainly agree that agents / existing options would be chaotic hell to use this way. But I think the high-level idea has some potential, independent of that.
I think we’ll either get to the point where AI is so advanced it replaces the manager, the PM, the engineer, the designer, and the CEO, or we’ll keep using formal languages to specify how computers should work.
My Git history contains links between the false starts and misunderstandings and the corrections, which then also include a paragraph on my this was a misunderstanding or false start. It is a lot better than just a single linear log from LLMs.
My Git history contains links between the false starts and misunderstandings and the corrections, which then also include a paragraph on my this was a misunderstanding or false start. It is a lot better than just a single linear log.
And maybe there is a way to trim the parts out of it that are not needed... like to automatically produce an initial prompt which is equivalent to the results of a longer session, but is precise enough so as to not need clarification upon reprocessing it. Something like that? I'm not sure if that's something that already exists.
Why would you think this though? There are an infinite number of programs that can satisfy any non-trivial spec.
We have theoretical solutions to LLM non-determinism, we have no theoretical solutions to prompt instability especially when we can’t even measure what correct is.
Compilers use rigorous modeling to guarantee semantic equality and that is only possible because they are translating between formal languages.
A natural language spec can never be precise enough to specify all possible observable behaviors, so your bot swarm trying to satisfy the spec is guaranteed to constantly change observable behaviors.
This gets exposed to users and churn, jank, and workflow breaking bugs.