If you take an LLM that makes 10 tool calls in a row for an evaluation, any reduction in unpredictable drift is welcome. Same applies to running your prompt through DSPy Optimizer. [0] Countless other examples. Basically any situation where you are in control of the prompt, the token level input to the LLM, so there's no fuzziness.
In this case, if you would've eliminated token level fuzziness and can yourself guarantee that you're not introducing it from your own end, you can basically map out a much more reliable tree or graph structure of your system's behavior.
[0]: https://dspy.ai/#2-optimizers-tune-the-prompts-and-weights-o...
why use an ambiguous natural language for a specific technical task? i get that its a cool trick but surely they can come up with another input method by now?
Since I'm really looking to sample the only the top ~10 tokens, and I mostly test on CPU-based inference of 8B models, there's probably not a lot of worries getting a different order of the top tokens based on hardware implementation, but I'm still going to take a look at it eventually, and build in guard conditions against any choice that would be changed by an epsilon of precision loss.
This nonlinear and chaotic behavior regardless of implementation details of the black box makes LLM seem to be nondeterministic. But LLM is just a pseudo random number generator with a probability distribution.
(As I am writing this on my iPhone with text completion, I can see this nondeterministic behavior)
If i want to covert "how do I x" to `api.howTo("x")` it is very important that i get the exact same result every time.
Today we have a extremely hacky workaround by ensuring that at least the desired chunk from the RAG is selected, but it's far from ideal and our code is not well written (a temporary POC written by AI that has been there for quite some months now ...)