Knowing whether those theorems are the right theorems for the problem can be as difficult as understanding the implementation itself. Hence the example of SeL4 where the number of theorems exceeds lines of code in the original implementation and the formal model is large.
It's my experience that most people doing formal methods have seen cases where they actually proved something slightly different than what they intended to. This usually involves an unintentional assumption that isn't generally true.