upvote
> I can't prove it with math or logic yet, but I have a feeling that it’ll never happen.

It's not really that hard to actually prove it with math.

It's a computer, so to produce the boolean result (safe or unsafe) there has to be a mathematical formula. This formula will inherently be extremely complex, but even a very simple formula has a huge problem. Suppose "unsafe" is true if X - Y > 0. Make X and Y themselves as simple or complicated as you like but even in the simplest version it's already impossible to calculate unless the model has perfect information.

You can't calculate "X - Y" if you don't know the value of X. And it's indisputable that there is information it doesn't have. Case in point, telling you about a vulnerability in some piece of code is safe (and indeed not telling you is unsafe) if you're the developer and you want to patch it or an administrator and want to mitigate it, but the opposite if you're the attacker and want to exploit it. The model does not know which one you are, therefore it cannot make the correct determination any more than it can solve one equation with two unknowns.

reply
This is why we have courts and juries. Creating laws that cover all cases and contexts is effectively impossible, so we have humans decide what a fair outcome would be in this specific situation.
reply
Imagine how many tokens Claude would burn waiting for litigation, not to mention letting it reconsider now that it understands the problem completely!
reply