upvote
I assumed that they had to train, otherwise how else would they get "inside" a transformer.

I also feel a bit of bad smell from the article. Sounding revolutionary with no details or clear explanation.

reply
There is no training in the usual sense of the term, i.e. no gradient descent, no differentiable loss function. They use deceptive language early on to make it sound this way, but near the end make it clear their model as is isn't actually differentiable, and in theory might still work if made differentiable. But they don't actually know.

But IMO this is BS because I don't know how one would get or generate training data, or how one would define a continuous loss function that scores partially-correct / plausible outputs (e.g. is a "partially correct" program / algorithm / code even coherent, conceptually).

reply
I would assume it was manually coded.
reply