upvote
Models not being able to reliably know if they are out of their depth is a foundational limitation of the currently generation of models, though.

Best they can do is to somewhat reliably react to objective signals that they've failed at something (like test failures).

reply