undefined

points

by prng202118 hours ago |

[-]

Can you do basic addition of 2 arbitrary numbers with 100% accuracy (no tools) ? No you can't. You will make mistakes for a sufficiently large N even with pen and paper, and a very small N without. Are you no longer generally intelligent ?

by undersuit6 hours ago|

parent|

[-]

Somewhere along the line a $10000 GPU has to be equivalent to using a finger to do arithmetic in the dust.

by sambapa15 hours ago|

parent|

prev|

[-]

No, but I can develop methods to eventually do it.

by wmf18 hours ago|

prev|

[-]

LLMs should use tool calling (which is 100% reliable) instead of doing math internally. But in general it would be nice to be able to teach a process and have the AI execute it deterministically. In some sense, reliability between 99% and 100% is the worst because you still can't trust the output but the verification feels like wasted effort. Maybe code gen and execution will get us there.

by base7611 hours ago|

parent|

[-]

This is the exact problem CognOS was built to solve.

  99% reliable means you still can't remove the human from the loop — because you never know which 1% you're in. The only way to actually trust output is to attach a verifiable confidence   
  signal to each response, not just hope the aggregate accuracy holds.                                                                                                                        
                                                                                                                                                                                            
  We built a local gateway that wraps every LLM output with a trust envelope: decision trace, risk score, and an explicit PASS/REFINE/ESCALATE/BLOCK classification. The point isn't to make 
  LLMs more accurate — it's to make their uncertainty legible so the human knows when to step in.

  Open source if you want to look at the architecture: github.com/base76-research-lab/operational-cognos

by base7612 hours ago|

parent|

prev|

[-]

"reliability between 99% and 100% is the worst because you still can't trust the output"