undefined

points

[-]

My interpretation is that they built a simple virtual machine directly into the weights, then compiled a WASM runtime for that machine, then compiled the solver to that runtime.

by gavinray2 hours ago|

prev|

[-]

The article states they trained a WASM interpreter and programs are represented as WASM bytecode

by D-Machine1 hours ago|

parent|

[-]

Nope, they encoded or compiled in a simple VM / WASM interpreter to the transformer weights, there is no training. You'd be forgiven for this misreading, as they deliberately mislead early on that their model is (in principle) trainable, but later admit that their actual model is not actually differentiable, but that a differentiable approximation "should" still work (despite no info about what loss function or training data could allow scoring partially correct / incomplete program outputs).