undefined

points

[-]

A key point here is open in terms of being able to download and use it, not open as knowing what data and instructions were fed into it when training.

A paranoid part of me thinks that these models are all inherently biased and instructed to be pro CCP, with specific gaps in their training data related to undesirable historic events and political ideas.

by gck17 hours ago|

parent|

[-]

The same thing applies to US models. Check out various system prompt leak repos on github. There are also prompt injections by various parallel "alignment" models that pre-process the prompt before it's sent to the main one with questionable guidance.

You'd be surprised how much of bias exists in easily extractable information. Now imagine how much of that happens during training, that you can't easily extract.

So this is largely a moot point. Yes, Chinese models will likely have some weird things injected into them. But so do the US models. Do I care? Not in the slightest. Models are my code monkeys, and if the code leaves my machine, I assume IP is leaked be it a Chinese model that clearly tells me they do use the data, or US models that pinky promise they don't.

by therealdrag013 hours ago|

parent|

prev|

[-]

Sure but that goes both ways. Any dataset has a bias. My coding doesn’t need to know about Tienamen square.

by viking12310 hours ago|

parent|

prev|

[-]

Applies both ways, ask it about Israel.