undefined

upvote

points

by shevy-java14 hours ago |

upvote

by dspillett14 hours ago|

[-]

> how can a2mark ensure that AI did NOT do a clean-room conforming rewrite?

In cases like this it is usually incumbent on the entity claiming the clean-room situation was pure to show their working. For instance how Compaq clean-room cloned the IBM BIOS chip¹ was well documented (the procedures used, records of comms by the teams involved) where some other manufacturers did face costly legal troubles from IBM.

So the question is “is the clean-room claim sufficiently backed up to stand legal tests?” [and moral tests, though the AI world generally doesn't care about failing those]

--------

[1] the one part of their PCs that was not essentially off-the-shelf, so once it could be reliably legally mimicked this created an open IBM PC clone market

reply

upvote

by foltik12 hours ago|

[-]

Turns out there’s no need to speculate. Someone pointed out on GH [0] that the AI was literally prompted to copy the existing code:

> *Context:* The registry maps every supported encoding to its metadata. Era assignments MUST match chardet 6.0.0's `chardet/metadata/charsets.py` at https://raw.githubusercontent.com/chardet/chardet/f0676c0d6a...

> Fetch that file and use it as the authoritative reference for which encodings belong to which era. Do not invent era assignments.

[0] https://github.com/chardet/chardet/issues/327#issuecomment-4...

reply

upvote

by orthoxerox10 hours ago|

[-]

That's data, not code.

reply

upvote

by foltik4 hours ago|

[-]

It’s a python file from chardet 6, doesn’t matter what you think it does. It clearly wasn’t a clean room reimplementation.

reply

upvote

by 5 hours ago|

[-]

deleted

reply

upvote

by titanomachy14 hours ago|

[-]

The foundation model probably includes the original project in its training set, which might be enough for a court to consider it “contaminated”. Training a new foundation model without it is technically possible, but would take months and cost millions of dollars.

reply

upvote

by orthoxerox13 hours ago|

[-]

Clean room is sufficient, but not necessary to avoid the accusations of license violation.

a2mark has to demonstrate that v7 is "a work containing the v6 or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language", which is different from demanding a clean-room reimplementation.

Theoretically, the existence of a publicly available commit that is half v6 code and half v7 can be used to show that this part of v7 code has been infected by LGPL and must thus infect the rest of v7, but that's IMO going against the spirit of the [L]GPL.

reply

upvote

by Orygin12 hours ago|

[-]

Please don't use loaded terms like "infect". The license does not infect, it has provisions and requirements. If you want to interact with it, you either accept them or don't use the project. In this case, the author of v7 is trying to steal the copyrighted work of other authors by re-licensing it illegally.

reply

upvote

by orthoxerox11 hours ago|

[-]

Is their work present in v7?

reply

upvote

by duskdozer10 hours ago|

[-]

Yes. The AI operator posted this as the prompt: https://github.com/chardet/chardet/commit/f51f523506a73f89f0...

which, minimally instructs it to directly examine the test suite: `4. High encoding accuracy on the chardet test suite`

reply

upvote

by orthoxerox10 hours ago|

[-]

So what? Is reading code the same as copying code or modifying existing code?

reply

upvote

by Orygin10 hours ago|

[-]

If you want to prove you did not make a derivative work, yes it helps if you never read the source code. Hence so call "clean room" implementations.

reply

upvote

by orthoxerox10 hours ago|

[-]

Why should I prove that? Let those who claim the violation prove that.

reply

upvote

by Orygin10 hours ago|

[-]

There is plenty of evidence already. The claim has been substantiated.

You can't just dismiss it then say the claimant has to provide proof.

reply

upvote

by Orygin11 hours ago|

[-]

Yes. Commits clearly show in progress where both LGPL and MIT code was working together. This clearly show they are a derivative work and MUST follow the original license.

Plus the argument put forth is that they can re-license the project. It's not a new one made from scratch.

reply

upvote

by tzs7 hours ago|

[-]

Did they eventually remove/replace all the LGPL code?

reply

upvote

by orthoxerox10 hours ago|

[-]

So, if these commits were private and squashed together before 7.0 was published there would be no violation?

reply

upvote

by Orygin10 hours ago|

[-]

The commits being public or not does not change the fact the developement was made as a derivative work of the original version.

reply

upvote

by duskdozer10 hours ago|

[-]

They would be concealing the violation.

reply

upvote

by orthoxerox10 hours ago|

[-]

Consider TCC relicensing. They identified the files touched by contributors that wanted to keep the GPL license and reimplemented them. No team A/team B clean room approach used. The same happened here, but at a different scale. All files now have a new author and this author is free to change the license of his work.

reply

upvote

by __alexs14 hours ago|

[-]

I think the problem here is that an AI is not a legal entity. It doesn't matter if you as individual run an AI that takes the source, dumps out a spec that you then feed into another AI. The legal liability lies with the operator of the AI, the original copyleft license was granted to a person, not to a robot.

Now if you had 2 entirely distinct humans involved in the process that might work though.

reply