upvote
My interpretation is that they built a simple virtual machine directly into the weights, then compiled a WASM runtime for that machine, then compiled the solver to that runtime.
reply
The article states they trained a WASM interpreter and programs are represented as WASM bytecode
reply
Nope, they encoded or compiled in a simple VM / WASM interpreter to the transformer weights, there is no training. You'd be forgiven for this misreading, as they deliberately mislead early on that their model is (in principle) trainable, but later admit that their actual model is not actually differentiable, but that a differentiable approximation "should" still work (despite no info about what loss function or training data could allow scoring partially correct / incomplete program outputs).
reply