They can license training data. They have trillions, look what they are dumping into it, you seriously think they can't afford to license data.
Obviously it would be easier if they do it from the start, but that was their trick, to do it while people don't notice and get big ASAP. Should they get away with it?
Also, it would solve their Chinese problem, because it would make them violate copyright too. Right now it's more like rules for thee not for me so it's hard to take seriously.