upvote
I'm not 100% sure it's not possible. If (I don't know) it's possible to freeze the temperature of the model so it's deterministic, and if you could make a map of produced words back to tokens (via HMM probably), then you can probably alter a minimal input and observe the output to model it. If you perform waves of such minimal alterations, you can expect to be able to locate the distance where each alteration impact the model (the idea being that a small alteration on output is likely due to the last layers of the models, and a small alteration is likely due to the deeper layer). Once you've located most of the last layer(s?) weights, you can try to solve for them. With a hundreds of billions weights model, the last layers will likely be so huge that it's probably unfeasible technically, but it's theoretically possible.
reply
No, you'd need to have the model on your filesystem for direct access, and then the architecture would need to be the same.
reply
If you have access to the weights, you can just use them as is...
reply
Anthropic are not saying they have been hacked - they are saying that Alibaba have been sending lot of requests to their servers.
reply
You can do things like that - one example is averaging weights between related models - but not with Anthropic's models, because outsiders don't have access to the weights.
reply
Weights are just data a server, so we don't know outsiders have access (either via breakin or arrangement).
reply
Yes, obviously. That's not the point.
reply