LLMs are fundamentally non-deterministic. Trying to use them to solve deterministic problem spaces is selecting the wrong tool for the job, and expecting them to be 100% reliable is the wrong mindset for working with them.
You can get Claude Code to fulfill some interface contract with almost certainty. Exactly how it does that will vary between runs.
So to me the more interesting question is, what exactly is it you care about inside the sausage, and how do you verify that it's there in the right amounts?
Although obviously I'd rather they stay non-deterministic
The short version: let logic decide, and if there are multiple solutions let the model reason within a fixed range grounded by GraphRAG. Test every output against the logic, re-parse on contradiction, and emit 'unsure' after a couple of iterations rather than guessing.
It's no use for general knowledge. But where the judgement is largely codifiable it holds up well. There's an edge case out there that'll turn it to custard, I just haven't found it yet.
I've connected Claude Desktop to it over MCP and the results are good, not great. I designed the thing so I'm working in the sweet spot and there's still the occasional WTF.
Theoretically, if you could specify a seed and the exact version of the model the output should always be the same. I wonder if this is possible with any open-weight models today?
---
On a more practical level, scripts (small programs) are deterministic so having the coding agent write (and possibly reuse) scripts might help.
not gonna be 100% but reduces most issues significantly and helps with debug