Often it seems like tech maximalists are the most against tech reliability.
Imagine that - you got your project done ahead of schedule (which looks great on your OKRs) AND finally achieved your dream of no longer being dependent on those stupid overpaid, antisocial software engineers, and all it cost you was the company's reputation. Boeing management would be proud.
Lots of business leaders will do the math and decide this is the way to operate from now on.
I suggest when their pointer dereferences, it can go a bit forward or backwards in memory as long as it is mostly correct.
Then my job became I am assigned a larger implementation and depending on how large the implementation was, I had to design specifications for others to do some or all of the work and validate the final product for correctness. I definitely didn’t pore over every line of code - especially not for front end work that I stopped doing around the same time.
The same is true for LLMs. I treat them like junior developers and slowly starting to treat them like halfway competent mid level ticket takers.
No. LLMs are undefined behavior.
But most LLM services on purpose introduce randomness, so you don’t get the same result for the same input you control as a user.
"Deterministic" is not the the right constraint to introduce here. Plenty of software is non-deterministic (such as LLMs! But also, consensus protocols, request routing architecture, GPU kernels, etc) so why not compilers?
What a compiler needs is not determinism, but semantic closure. A system is semantically closed if the meanings of its outputs are fully defined within the system, correctness can be evaluated internally and errors are decidable. LLMs are semantically open. A semantically closed compiler will never output nonsense, even if its output is nondeterministic. But two runs of a (semantically closed) nondeterministic compiler may produce two correct programs, one being faster on one CPU and the other faster on another. Or such a compiler can be useful for enhancing security, e.g. programs behave identically, resist fingerprinting.
Nondeterminism simply means the compiler selects any element of an equivalence class. Semantic closure ensures the equivalence class is well‑defined.
That a compiler might pick among different specific implementations in the same equivalency class is exactly what you want a multi-architecture optimizing compiler to do. You don't want it choosing randomly between different optimization choices within an optimization level, that would be non-deterministic at compile time and largely useless assuming that there is at most one most optimized equivalent. I always want the compiler to choose to xor a register with itself to clear it if that's faster than explicitly setting it to zero if that makes the most sense to do given the inputs/constraints.
There are legitimate compiler use cases e.g. search‑based optimization, superoptimization, diversification etc where reproducibility is not the main constraint. It's worth leaving conceptual space for those use cases rather than treating deterministic output as a defining property of all compilers
You are attempting to hedge and leave room for a non-deterministic compiler, presumably to argue that something like vibe-compilation is valuable. However, you've offered no real use cases for a non-deterministic compiler, and I assert that such a tool would largely be useless in the real world. There is already a huge gap between requirements gathering, the expression of those requirements, and their conversion into software. Adding even more randomness at the layer of translating high level programming languages into low level machine code would be a gross regression.
https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
I am not. To me that describes a debugging fiasco. I don't want "semantic closure," I want correctness and exact repeatability.
Meanwhile, you press the "shuffle" button, and code-gen creates different code. But this isn't necessarily the part that's supposed to be reproducible, and isn't how you actually go about comparing the output. Instead, maybe two different rounds of code-generation are "equal" if the test-suite passes for both. Not precisely the equivalence-class stuff parent is talking about, but it's simple way of thinking about it that might be helpful
On a practical level, existing implementations are nondeterministic because they don't take care to always perform mathematically commutative operations in the same order every time. Floating-point arithmetic is not commutative, so those variations change the output. It's absolutely possible to fix this and perform the operations in the same order every time, implementors just don't bother. It's not very useful, especially when almost everything runs with a non-zero temperature.
I think the whole nondeterminism thing is overblown anyway. Mathematical nondeterminism and practical nondeterminism aren't the same thing. With a compiler, it's not just that identical input produces identical output. It's also that semantically identical input produces semantically identical output. If I add an extra space somewhere whitespace isn't significant in the language I'm using, this should not change the output (aside from debug info that includes column numbers, anyway). My deterministic JSON decoder should not only decode the same values for two runs on identical JSON, a change in one value in the input should produce the same values in the output except for the one that changed.
LLMs inherently fail at this regardless of temperature or determinism.
No, a compiler needs determinism. The article is quite correct on this point: if you can't trust that the output of a tool will be consistent, you can't use it as a building block. A stochastic compiler is simply not fit for purpose.
There’s even efforts to guarantee this for many packages on Linux - it’s a core property of security because it lets you validate that the compilation process or environment wasn’t tampered with illicitly by being able to verify by building from scratch.
Now actually managing to fix all inputs and getting deterministic output can be challenging, but that’s less to do with the compiler and more to do with the challenge of completely taking the entire environment (the profile you are using for PGO, isolating paths on the build machine being injected into the binary, programs that have things in their source or build system that’s non deterministic (e.g. incorporating the build time into the binary)
Hence why it is hard to do benchmarks with various kinds of GC and dynamic compilers.
You can't even expect deterministic code generation for the same source code across various compilers.
> PGO seems like it ought to have a random element.
PGO should be deterministic based on the runs used to generate the profile. The runs are tracking information that should be deterministic--how many times does the the branch get taken versus not taken, etc. HWPGO, which relies on hardware counters to generate profiling information, may be less deterministic because the hardware counters end up having some statistical slip to them.
or does your binary always come out differently each time you compile the same file??
You can try it. try to compile the same file 10 times and diff the resultant binaries.
Now try to prompt a bunch of LLMs 10 times and diff the returned rubbish.
There's this really good blog post about how autovectorization is not a programming model https://pharr.org/matt/blog/2018/04/18/ispc-origins
The point is that you want to reliably express semantics in the top level language, tool, API etc. because that's the only way you can build a stable mental model on top of that. Needing to worry about if something actually did something under the hood is awful.
Now of course, that depends on the level of granularity YOU want. When writing plain code, even if it's expressively rich in the logic and semantics (e.g. c++ template metaprogramming), sometimes I don't necessarily care about the specific linker and assembly details (but sometimes I do!)
The issue I think is that building a reliable mental model of an LLM is hard. Note that "reliable" is the key word - consistent. Be it consistently good or bad. The frustrating thing is that it can sometimes deliver great value and sometimes brick horribly and we don't have a good idea for the mental model yet.
To constrain said possibility space, we tether to absolute memes (LLMs are fully stupid or LLMs are a superset of humans).
Idk where I'm going with this
Humans, in all their non deterministic brain glory, long ago realized they don't want their software to behave like their coworkers after a couple of margaritas.
They are designed to be where temperature=0. Some hardware configurations are known defy that assumption, but when running on perfect hardware they most definitely are.
What you call compilers are also nondeterministic on 'faulty' hardware, so...
To say the least, this is garbage compared to compilers
When isn't that true?
int main() {
printf("Continue?\n");
}
and int main() {
printf("Continue?\n");
printf("Continue?\n");
}
do not see the compiler produce equivalent outputs and I am not sure how they ever could. They are not equivalent programs. Adding additional instructions to a program is expected to see a change in what the compiler does with the program.With LLMs the output depends on the phases of the moon.
As with LLMs, unless you ask for the output to be nondeterministic. But any compiler can be made nondeterministic if you ask for it. That's not something unique to LLMs.
> With LLMs the output depends on the phases of the moon.
If you are relying on a third-party service to run the LLM, quite possibly. Without control over the hardware, configuration, etc. then there is all kinds of fuckery that they can introduce. A third-party can make any compiler nondeterministic.
But that's not a limitation of LLMs. By design, they are deterministic.
Not unique as in: no one makes their compilers deterministic, and you have to work to make a non-deterministic one. LLMs are non-deterministic by default, and you have to contort them to the point of uselessness to make them deterministic
> If you are relying on a third-party service to run the LLM, quite possibly. Without control over the hardware, configuration, etc.
Again. Even if you control everything, the only time they produce deterministic output is when they are completely neutered:
- workaround for GPUs with num_thread 1
- temperature set to 0
- top_k to 0
- top_p to 0
- context window to 0 (or always do a single run from a new session)
Go (gc) was specifically designed to produce reproducible builds, so clearly that's not true, but you are right that it isn't the norm.
Some of the most widely recognized and used compilers, like gcc, clang, even rustc, are nondeterministic. If you work hard and control all the variables (e.g. -frandom-seed), you can make these compilers deterministic, but, hey, if you work hard you can make LLMs nondeterministic type.