Any benchmark comparisons to Fara-7B or Sonnet 4.6, Qwen 3.5 etc.?
In particular the Forward rollout module is very important. It aligns your (effectively) world model with what it expects from the world, and keeping those in sync I think gives this the power it needs to be able to generate the state action pairs to continuously train semi supervised