upvote
Most LLMs are trained on a lot of the source code for many open-source projects. This 'project' has the whole song-and-dance about never seeing the source code and separating the system to skirt around legal trouble. Why didn't anyone do that yet?
reply
Because that's impossible. Any "robot" that can generate code must be trained on massive amounts of code, most of which is open source.
reply
And how are you supposed to guarantee equivalent functionality by analyzing "README files, API docs, and type definitions"?
reply
It's described on the web page but it's by having 2 agents. One has access to the code and one doesn't.
reply
Are they the same model?

Not that it matters, I just think the joke is more fun if they are different.

reply
The joke is that you don’t.
reply
not a lot of code is public domain and thus not a lot of training data is available
reply
For each project you want to rip off, you'd have to first train an entirely new LLM on all sources except for the target project. Prohibitively expensive.
reply