This is 3% or infinitely far away from the perfect tech.
The perfect tech is the stack.
https://arxiv.org/abs/2305.13673
and of course a^n b^n is also classic CFG, so it's not clear why one paper had positive results while the other hand negative.
I cannot find probability of success in paper you linked. Is it 100%? I believe it is less than 100%, because LLMs are intrinsically probabilistic machines.