upvote
> Paid out like Spotify pays out artists.

So, mostly to fraudulent AI spam?

AI makes this problem worse in both directions. It makes it fantastically easy to produce ""content"". So if you're scraping content, or browsing content, you're going to run in to increasing amounts of AI. Micropayments makes this worse, because it's then a means of getting paid to produce spam. The problem comes when you want the ""content"" to be connected to real questions like "how does my dryer work" or "what is going to happen to oil availability six months from now".

AI trainers didn't pay book authors until forced to. $3,000 ended up being a pretty high value! But it was also a one-off. Everyone writing books from now on is going to have to deal with being free grist to the machine.

reply
> So, mostly to fraudulent AI spam?

Spotify does not pay out mostly to AI spam.

Their pay scales by listens. The AI spam doesn’t collect many listens. The spammers do it because they can automate it and make it low effort, but it’s not a cash cow for the spammers.

reply
An interesting listen https://darknetdiaries.com/episode/171/ about money laundering and spam in streaming services
reply
Spammers do it because it pays out.
reply
> Paid out like Spotify pays out artists.

As others said, Spotify pays shit for artists, but maybe that's the problem with the whole thing here. It should be more like how Bandcamp pays artists (80% to the artists, 20% for Bandcamp), but then the rapacious economy supporting the largest LLM providers would collapse and (wipes away a single tear) we'd all have to use simpler, cheaper, most likely local models.

reply
> Paid out like Spotify pays out artists.

That's probably not the best comparison. Spotify only benefits the big players resp. those with the most bots. If you actually want to support specific artists, you'd have to use Bandcamp or similar sites.

reply
I think most labs actively create synthetic data using existing model as part of the mix for the pretraining stage for their next model.

Would love to know exactly what the latest process is to keep slop out of training data.

reply
I think everyone overblows the whole "AI is poisoning AI!" thing. It could be a problem but the genuine value in Reddit or any other human social media is honestly pretty low from my estimates. It's great for seeing how humans talk but in terms of 'nutritional' value for truth or answers... I am not sold. If I was choosing what to 'feed' AI, I wouldn't even bother with textual social media (besides Github / Gitlab / other source control)

There's way more value, if seeking out answers, in following the links to external sources, scraping books, and other sources that aren't "unwashed masses saying whatever they want".

reply
> the genuine value in Reddit or any other human social media is honestly pretty low from my estimates. It's great for seeing how humans talk but in terms of 'nutritional' value for truth or answers...

> ...

> scraping books, and other sources that aren't "unwashed masses saying whatever they want".

The problem is there's a lot of knowledge that only exists as reddit comments, blog posts, or social Q&A.

reply
You can put it in scare quotes all you want, doesn't stop you from sounding like Scrooge McDuck.
reply
const isAiContent = (str) => str.includes('—');?

:)

reply
Latest generation LLM's use en dashes instead of em dashes to avoid detection.
reply
No, they don’t. But obviously GP was tongue–—in-–cheek.
reply
> in danger

It has already done so, and we can be confident in saying that.

Verified content will always be relatively expensive when compared to AI content.

Visits to wikipedia and most sites have dropped. Rtings has gone full paywall. Ad revenue for producing Verified content will be too meager to allow for public consumption.

Theres jokes about GenAI being the great filter; while I doubt this, I do hope this is the final push that makes us think of how we want our information commons to be nurtured.

reply
> Verified content will always be relatively expensive when compared to AI content....

> Visits to wikipedia and most sites have dropped. Rtings has gone full paywall. Ad revenue for producing Verified content will be too meager to allow for public consumption.

AI is a technology that's going to further entrench inequality, by warping incentives to push us further away from democratization. Unless you've got $$$ to drop on verified content, you'll be served prolefeed slop and be that much more ignorant.

reply
At this point, it feels like most technology will be used in favor of people with power, and not in a democratizing manner.

I'd argue that this is something that is more about the state of play, than tech itself.

reply
> I was imaging if LLMs could finally solve the micropayments solution people have always proposed for the internet. Part of my monthly payment gets split between all of the sites that the LLM scraped knowledge. Paid out like Spotify pays out artists.

As a software user I wish I could do the same for all the software I use.

reply
Many open source projects accept donations. There's also explicitly paid-for software. What exactly do you wish for that you can't do right now?
reply
Specifically the part where engineers get paid the same way as artists on Spotify.
reply
So a handful will make a buttload but the vast majority won't make enough to pay rent?
reply
Certainly that's how open source pans out.
reply
So not at all for their work and with a reverse Robin Hood model? That would be terrible for software. The way artists gets paid on streaming is a genius play at catering to the biggest artists and labels and screw over the smaller ones, especially true on Spotify with their freemium model
reply
> I was imaging if LLMs could finally solve the micropayments solution people have always proposed for the internet. Part of my monthly payment gets split between all of the sites that the LLM scraped knowledge. Paid out like Spotify pays out artists.

This system is usually called taxes.

Which then pay for the universal healthcare, free education, affordable housing, libraries, parks,.. and so on.

LLM doesn't need to invent it, we should stop allowing them (people and companies behind LLM) to avoid it.

reply