“Typically, a clean-room design is done by having someone examine the system to be reimplemented and having this person write a specification. This specification is then reviewed by a lawyer to ensure that no copyrighted material is included. The specification is then implemented by a team with no connection to the original examiners.”
Otherwise it's not clean-room, it's plagiarism.
I have read nowhere near as much code (or anything) as what Claude has to read to get to where it is.
And I can write an optimizing compiler that isn't slower than GCC -O0
(prompt: what does a clean room implementation mean?)
From ChatGPT without login BTW!
> A clean room implementation is a way of building something (usually software) without copying or being influenced by the original implementation, so you avoid copyright or IP issues.
> The core idea is separation.
> Here’s how it usually works:
> The basic setup
> Two teams (or two roles):
> Specification team (the “dirty room”)
> Looks at the original product, code, or behavior
> Documents what it does, not how it does it
> Produces specs, interfaces, test cases, and behavior descriptions
> Implementation team (the “clean room”)
> Never sees the original code
> Only reads the specs
> Writes a brand-new implementation from scratch
> Because the clean team never touches the original code, their work is considered independently created, even if the behavior matches.
> Why people do this
> Reverse-engineering legally
> Avoid copyright infringement
> Reimplement proprietary systems
> Create open-source replacements
> Build compatible software (file formats, APIs, protocols)
I really am starting to think we have achieved AGI. > Average (G)Human Intelligence
LMAO
If you try to reimplement something in a clean room, its a step by step process, using your own accumulated knowledge as the basis. That knowledge that you hold in your brain, all too often is code that may have copyrights on it, from the companies you worked on.
Is it any different for a LLM?
The fact that the LLM is trained on more data, does not change that when you work for a company, leave it, take that accumulated knowledge to a different company, you are by definition taking that knowledge (that may be copyrighted) and implementing it somewhere else. It only a issue if you copy the code directly, or do the implementation as a 1:1 copy. LLMs do not make 1:1 copies of the original.
At what point is trained on copyrighted data, any different then a human trained on copyrighted data, that get reimplemented in a transformative way. The big difference is that the LLM can hold more data over more fields, vs a human, true... But if we look at specializations, this can come back to the same, no?
They weren't trillion dollar AI companies to bankroll the defense sure. But thinking about clean room and using copyrighted stuff is not even an argument that's just nonsense to try to prove something when no one asked.