These tokens are almost universally used as stop tokens which causes generation to stop and return control to the user.
If you didn't do this, the model would happily continue generating user + assistant pairs w/o any human input.
Also, they're usually bracketed by special tokens to distinguish them from "normal" output for both the model and the harness.
(They can get pretty weird, like in the "user said no but I think they meant yes" example from a few weeks ago. But I think that requires a few rounds of wrong conclusions and motivated reasoning before it can get to that point - and not at the beginning)