If true then I have no idea how anyone’s going to release a useful model that doesn’t have the same jailbreak. https://www.theregister.com/security/2026/06/15/feds-freaked...
This is a logical flaw. LLM that is immune to jailbreak _could_ exist, but not yet, or maybe nobody talks about it. Yes there's a market, but all of these AI boom is too recent to make any claims.
I don't think that's quite what it means. The theorem says that it's impossible to write a function, "will_halt(program, input)", that will be correct for all possible {program, input} pairs. But for a particular program, you may be able to write a proof that it will halt for all inputs -- that's what software verification is about.
The implications here would be that nobody can create a "will_jailbreak(model, input)" function which works for all model/input pairs. But we don't need a general function which works for all model/input pairs; we just need a way to prove that for a specific model, there will be no jailbreaks for any input. As with software verification, this may require that the model be developed in a specific way.
Granted we don't currently know how to make such a proof regarding neural networks; but that's not because of Gödel.