upvote
The pace of data creation is only increasing, and our capabilities of sharing and storing it is growing as well. Lots of this is out in the open, ready for anyone to crawl and scrape.

There probably is a point of “peak data” where the amount of new data will start decreasing, but that’s likely a 22nd or 24rd century problem.

reply
Pace of data creation ignores the fact that the majority of the big gains in LLM “intelligence” has come from scraping and feeding in the huge amount of public data that already exists.

Unless we’re producing data on the order of an entire new internet every couple of years, then it’s hard to see how LLMs can achieve further huge leaps in capability compared to training on effectively 0% of the internet vs 100% of the internet.

reply
That is without going into fact that many already use AI to type out and write stuff. I have a customer in Far East that routinely uses it even for simple emails, he is not so familiar with English.
reply
The majority of the gains come from the size of the supercomputers used to train them on. That's still growing. The algorithms used, and how the data is annotated is also some secret sauce.
reply
If anything, trend will go towards sharing data less. It will become more important to keep the knowhow and data to yourself so the companies will do that.

And individuals will loose motivation to share, because it wont be that pro-social activity anymore anyway.

reply
imo it will slowly turn into where people run their own AI
reply
deleted
reply