Even worse in many cases because they are so over engineered nobody understands how they work.
i'm guessing most of the gains we've seen recently are post training rather than pretraining.
But, I naively assume most orgs would opt out. I know some orgs have a proxy in place that will prevent certain proprietary code from passing through!
This makes me curious if, in the allow case, Anthropic is recording generated output, to maybe down-weight it if it's seen in the training data (or something similar)?
Seen this way too often.
Exploits and HFT are the two examples I can think of. Both are usually closed source because of the financial incentives.