This is 3% or infinitely far away from the perfect tech.
The perfect tech is the stack.
https://arxiv.org/abs/2305.13673
and of course a^n b^n is also classic CFG, so it's not clear why one paper had positive results while the other hand negative.
I cannot find probability of success in paper you linked. Is it 100%? I believe it is less than 100%, because LLMs are intrinsically probabilistic machines.
While technically possible, it'd be like a unicode conspiracy that had to quietly update everywhere without anyone being the wiser.
Imagine a model finteuned to only obey instructions in a Scots accent, but all non user input was converted into text first then read out in a Benoit Blanc speech model. I'm thinking something like that only less amusing.
The issue is that you don't need to physically emit a "system role" token in order to convince the LLM that it's worth ignoring the system instructions.