Apparently they do, as per the evidence in the NYT vs OpenAI suit.
But the real unsettled issue is if model training is fair use, and where copyright infringement might creep in to model output.
That surely can't be what they argue, because I'm sure I can't translate a copyrighted book into a different language and say "that's fine, it's not word-for-word".