As for checking outputs - I don't believe that's sufficient. Maybe the letter of the law is flawed but according to the spirit the model itself is derivative work.
A model takes several orders of magnitude more work as training data than it takes to code the training algorithm itself, to any reasonable and sane person, that makes it a derivative work of the training data by nearly 100% - we can only argue how many nines it should be.
> precedent
Yeah but the US system makes me very uneasy about it. The right way to do this is to sit down, talk about the options and their downstream implications, talking about fairness and justice and then deciding what the law should be. If we did that, copyright law would look very different in the first place and this whole thing would have an obvious solution.