Even if the other models were trained on the same data, which is unlikely, since they had less time and money to scrape it and fewer lawyers to be able to do something like pirate, the proprietary models are still largely built on the public data and wouldn't exist without it. At the very least, they should release the intermediate model, before training on their proprietary data. Not that that's how that works...
Source? Otherwise this is pure speculation.