There is no training in the usual sense of the term, i.e. no gradient descent, no differentiable loss function. They use deceptive language early on to make it sound this way, but near the end make it clear their model as is isn't actually differentiable, and
in theory might still work if made differentiable. But they don't actually know.
But IMO this is BS because I don't know how one would get or generate training data, or how one would define a continuous loss function that scores partially-correct / plausible outputs (e.g. is a "partially correct" program / algorithm / code even coherent, conceptually).