upvote
> Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either.

I think you'll find that this is not settled in the courts, depending on how the data was obtained. If the data was obtained legally, say a purchased book, courts have been finding that using it for training is fair use (Bartz v. Anthropic, Kadrey v. Meta).

Morally the case gets interesting.

Historically, there was no such thing as copyright. The English 1710 Statute of Anne establishing copyright as a public law was titled 'for the Encouragement of Learning' and the US Constitution said 'Congress may secure exclusive rights to promote the progress of science and useful arts'; so essentially public benefits driven by the grant of private benefits.

The Moral Bottomline: if you didn't have to eat, would you care about who copies your work as long as you get credited?

The more the people that copy your work with attribution, the more famous you'll be. Now that's the currency of the future*. [1]

You'll do it for the kudos. [2][3]

  *Post-Scarcity Future. 
  [1] https://en.wikipedia.org/wiki/Post-scarcity
  [2] https://en.wikipedia.org/wiki/The_Quiet_War, et. al.
  [3] https://en.wikipedia.org/wiki/Accelerando
reply
> The Moral Bottomline: if you didn't have to eat, would you care about who copies your work as long as you get credited?

Yes.

I have 2 issues with "post-scarcity":

- It often implicitly assumes humanity is one homogeneous group where this state applies to everyone. In reality, if post-scarcity is possible, some people will be lucky enough to have the means to live that lifestyle while others will still by dying of hunger, exposure and preventable diseases. All else being equal, I'd prefer being in the first group and my chance for that is being economically relevant.

- It often ignores that some people are OK with having enough while others have a need to have more than others, no matter how much they already have. The second group is the largest cause of exploitation and suffering in the world. And the second group will continue existing in a post-scarcity world and will work hard to make scarcity a real thing again.

---

Back to your question:

I made the mistake of publishing most of my public code under GPL or AGPL. I regret is because even though my work has brought many people some joy and a bit of my work was perhaps even useful, it has also been used by people who actively enjoy hurting others, who have caused measurable harm and who will continue causing harm as long as they're able to - in a small part enabled by my code.

Permissive licenses are socially agnostic - you can use the work and build on top of it no matter who you are and for what purpose.

A(GPL) is weakly pro-social - you can use the work no matter what but you can only build on top of it if you give back - this produces some small but non-zero social pressure (enforced by violence through governments) which benefits those who prefer cooperation instead of competition.

What I want is a strongly pro-social license - you can use or build on top of my work only if you fulfill criteria I specify such as being a net social good, not having committed any serious offenses, not taking actions to restrict other people's rights without a valid reason, etc.

There have been attempts in this direction[0] but not very successful.

In a world without LLMs, I'd be writing code using such a license but more clearly specified, even if I had to write my own. Yes, a layer would do a better job, that does not mean anything written by a non-lawyer is completely unenforceable.

With LLMs, I have stopped writing public code at all because the way I see it, it just makes people much richer than me even richer at a much faster rate than I can ever achieve myself. Ir just makes inequality worse. And with inequality, exploitation and oppression tends to soon follow.

[0]: https://json.org/license.html

reply
> In reality, if post-scarcity is possible, some people will be lucky enough to have the means to live that lifestyle while others will still by dying of hunger, exposure and preventable diseases.

By definition, that's not a post-scarcity world; and that's already today's world.

> It often ignores that some people are OK with having enough while others have a need to have more than others, no matter how much they already have.

Do you think that's genetic, or environmental? Either way, maybe it will have been trained out of the kids.

> it has also been used by people who actively enjoy hurting others, who have caused measurable harm

Taxes work the same way too. "The Good Place" explores these second-order and higher-order effects in a surprisingly nuanced fashion.

Control over the actions of others, you have not. Keep you from your work, let them not.

> What I want is a strongly pro-social license - you can use or build on top of my work only if you fulfill criteria I specify such as being a net social good

These are all things necessary in a society with scarcity. Will they be needed in a post-scarcity society that has presumably solved all disorder that has its roots in scarcity?

> With LLMs, I have stopped writing public code at all because the way I see it, it just makes people much richer than me even richer at a much faster rate than I can ever achieve myself.

Yes, the futility of our actions can be infuriating, disheartening, and debilitating. Comes to mind the story about the chap that was tossing washed-ashore starfish one by one. There were thousands. When asked why do this futile task - can't throw them all back- he answered as he threw the next ones: it matters to this one, it matters to this one, ...

Hopefully, your code helped someone. That's a good enough reason to do it.

reply
> trained out of the kids

I don't think you understand how children work.

You probably imagine some Brave New World kind of conditioning. Not to mention, those people will want their kids to have those traits.

> Hopefully, your code helped someone. That's a good enough reason to do it.

No. That's like saying that the V2 rocket program helped keep a bunch of people out of the gas chambers.

We should absolutely do our best to make sure our work does more good than harm, not just that it does some good.

EDIT: I am sad to see your other comment below flagged/dead. HN does not like the idea that a lowly open source contributor could take their phones and computers away from them for petty things like genocide, murder or rape...

reply
[dead]
reply
> I strongly object to anthropomorphising text transformers (e.g. "Assisted-by").

I don't think this is anthropomorphising, especially considering they also include non-LLM tools in that "Assisted-by" section.

We're well past the Turing test now, whether these things are actually sentient or not is of no pragmatic importance if we can't distinguish their output from a sentient creature, especially when it comes to programming.

reply
> We're well past the Turing test now

Nope, there is no “The” Turing Test. Go read his original paper before parroting pop sci nonsense.

The Turing test paper proposes an adversarial game to deduce if the interviewee is human. It’s extremely well thought out. Seriously, read it. Turing mentions that he’d wager something like 70% of unprepared humans wouldn’t be able to correctly discern in the near future. He never claims there to be a definitive test that establishes sentience.

Turing may have won that wager (impressive), but there are clear tells similar to the “how many the r’s are in strawberries?” that an informed interrogator could reliably exploit.

reply
Would you say "assisted by vim" or "assisted by gcc"?

It should be either something like "(partially/completely) generated by" or if you want to include deterministic tools, then "Tools-used:".

The Turing test is an interesting thought experiment but we've seen it's easy for LLMs to sound human-like or make authoritative and convincing statements despite being completely wrong or full of nonsense. The Turing test is not a measure of intelligence, at least not an artificial one. (Though I find it quite amusing to think that the point at which a person chooses to refer to LLMs as intelligence is somewhat indicative of his own intelligence level.)

> whether these things are actually sentient or not is of no pragmatic importance if we can't distinguish their output from a sentient creature, especially when it comes to programming

It absolutely makes a difference: you can't own a human but you can own an LLM (or a corporation which is IMO equally wrong as owning a human).

Humans have needs which must be continually satisfied to remain alive. Humans also have a moral value (a positive one - at least for most of us) which dictates that being rendered unable to remain alive is wrong.

Now, what happens if LLMs have the same legal standing as humans and are thus able to participate in the economy in the same manner?

reply
If a linter insists on a weird line of code, I’m probably commenting that line as “recommended by whatever-linter”, yes.
reply
I wouldn't but I can see why some people would.

I can't point out where I draw the line clearly but here's one different I notice:

A recommendation can be both a thing and an action. A piece of text is a recommendation and it does not matter how it was created.

Assistance implies some parity in capabilities and cooperative work. Also it can pretty much only be an action, you cannot say "here is some assistance" and point to a thing.

reply
"Just a few weeks ago a SOTA model was shown to reproduce non-trivial amounts of licensed code[0]."

That LLM response is describing a specific project with full attribution.

reply
And it proves the code is stored (in a compressed form) in the model.
reply
So what's the legal issue here?

How does the chardet achieve this? Explain in detail, with shortened code excerpts from the library itself if helpful to the explanation.

The prompt is explicitly requesting the source!

reply
On https://news.ycombinator.com/item?id=47356000, it looks like the user there was intentionally asking about the implementation of the Python chardet library before asking it to write code, right? Not surprising the AI would download the library to investigate it by default, or look for any installed copies of `chardet` on the local machine.
reply
The comment says "Opus 4.6 without tool use or web access"
reply
For [0], it was supposedly shown to do it when specifically prompted to do so.

Despite agentic tools being used by millions of developers now, I am not aware of a single real case where accidental reproduction of copyrightable code has been an issue.

Further, some model providers offer indemnity clauses.

It seems like a non-issue to me, practically.

reply
deleted
reply