You seem to think the "training data" represents the collective will and intelligence and is otherwise unbiased, but that's completely untrue.
The combined data of the Internet is by no means a uniform representation of humanity's thoughts, opinions, and knowledge. Many things are dramatically overrepresented. Many things are absent entirely. Nearly everything is shaped by those with the money and power to own and control platforms and hosts.
Crawling the internet for knowledge is intense sampling bias.