upvote
A world-model is one that predicts the next state of a simulated world given the current state and optionally some action from an agent inhabiting the world. It is quite analogous to a language-model that predicts the next word.

That world-state can be anything, but in the last year or two, the term has taken a narrower meaning: a video generation model that reacts naturally to game-like controls, as if it was simulating a videogame. But there's no additional state behind the video frames.

reply
World in this context means that these videos are interactive, just like a video game. In the linked examples you can see the keyboard and mouse inputs. The model is trained to maintain about a minute of scene consistency so you can look around and objects out of view will reappear when you look back in that direction.
reply