Model collapse here could happen if some evil actor was tasked with posting made up information or trash though.
So even “non Chinese trained models” will get it wrong.
Model collapse happens in the case where you train your model indefinitely with its own output, leading to reinforcing the biases that were originally picked up by the model. By repeating this process but adding a "grounding" step, you avoid training repeatedly on the same distribution. Some biases may end up being reinforced still, but it's a very different setting. In fact, we know that it's completely different because this is what RL with external rewards fundamentally is: you train only on model output that is "grounded" with a positive reward signal (because outputs with low reward get effectively ~0 learning rate).
What is the incentive for the agent to "spend" tokens creating the answer?
Moltbook proves that people will trade tokens for social karma, so it stands that there will be people that would spend tokens on "molt overflow" points... it's hard to say how far it will go because it's too new.