It is hard to square with the article's claims about differentiability and otherwise lack of clarity / obscurantism about what they are really doing here (they really are just compiling / encoding a simple computer / VM into a slightly-modified transformer, which, while cool, is really not what they make it sound like at all).