Isn't that what https://github.com/uutils/coreutils is? GNU coreutils spec and test suite, used to produce a rust MIT implementation. (Granted, by humans AFAIK)
1. Generate specification on what the system does. 2. Pass to another "clean" system 3. Second clean system implements based just on the specification, without any information on the original.
That 3rd step is the hardest, especially for well known projects.
Then the model that is familiar with the code can write specs. The model that does not have knowledge of the project can implement them.
Would that be a proper clean room implementation?
Seems like a pretty evil, profitable product "rewrite any code base with an inconvenient license to your proprietary version, legally".
2. Dumped into a file.
3. claude-code that converts this to tests in the target language, and implements the app that passes the tests.
3 is no longer hard - look at all the reimplementations from ccc, to rewrites popping up. They all have a well defined test suite as common theme. So much so that tldraw author raised a (joke) issue to remove tests from the project.
AI muddies the water because large models trained on public repos can reproduce GPL snippets verbatim, so prompting with tests that mirror the original risks contamination and a court could find substantial similarity. To reduce risk use black-box fuzzing and property-based tools, have humans review and scrub model outputs, run similarity scans, and budget for legal review before calling anything MIT.
Our knowledge of what the person or the model actually contains regarding the original source is entirely incomplete when the entire premise requires there be full knowledge that nothing remains.
The thesis I propose is that tests are more akin to facts, or can be stated as facts, and facts are not copyright-able. That's what makes this case interesting.
If "tests" should mean a proper specification let's say some IETF RFC of a protocol, then that would be different.