upvote
This will happen regardless. LLMs are already ingesting their own output. At the point where AI output becomes the majority of internet content, interesting things will happen. Presumably the AI companies will put lots of effort into finding good training data, and ironically that will probably be easier for code than anything else, since there are compilers and linters to lean on.
reply
I've thought about this and wondered if this current moment is actually peak AI usefulness: the snr is high but once training data becomes polluted with it's own slop things could start getting worse, not better.
reply
I was wondering if anyone was doing this after reading about LLMs scraping every single commit on git repos.

Nice. I hope you are generating realistic commits and they truly cannot distinguish poison from food.

reply
Refresh this link 20 times to examine the poison: https://rnsaffn.com/poison2/

The cost of detecting/filtering the poison is many orders of magnitude higher than the cost of generating it.

reply