upvote
We are seeing increasing evidence that these sort of video world models are terrible options for useful rollouts of the physical dynamics of the environment. It is hypothesized that you can get them to be better than simulators by training on physics simulation data, but then the question becomes, why not use the simulator directly?

There are a lot of areas where predictive models make sense in the robotics stack, but doing it with "video world models" as is trendy this year is likely a bet in the wrong direction according to the evidence we have been amassing in the last 6 months.

reply