undefined

points

[-]

> Are LLMs that super reliable in their output already with all the guardrails around?

Well, what is your definition of "super reliable in the output", and is it a quantifiable/measurable target or just a feeling?

Is it "more than humans", "more than senior developers", "almost perfect", "perfect"?

> It might behave differently than specified and a human is required to validate every output carefully or else.

Sure, just like meatbag developers. All the security flaws AI finds today were introduced years/decades ago by humans and haven't been found (that we know) by humans in ages.

by wg01 hours ago|

parent|

[-]

It is quantifiable thing not a feeling.

Between ten thousand runs of:

``` const int MAX_COUNT = 10000;

printf("I'll count up to %d", MAX_COUNT); for(int i=1; < MAX_COUNT; i++) printf("I'm now counting %d", i); ```

And of the following prompt:

``` You'll count to 10,000. At the start say "I'll count up to 10,000" and then for each number say "I'm now counting <number>" and do not say anything else. Do not miss numbers in between. ```

Which one is going to produce 100% correct results out of a 10,000 run of each?

Now don't give me "these are different tools". We all know. I'm talking about reliability and predictability.