undefined

points

[-]

I'm not 100% sure it's not possible. If (I don't know) it's possible to freeze the temperature of the model so it's deterministic, and if you could make a map of produced words back to tokens (via HMM probably), then you can probably alter a minimal input and observe the output to model it. If you perform waves of such minimal alterations, you can expect to be able to locate the distance where each alteration impact the model (the idea being that a small alteration on output is likely due to the last layers of the models, and a small alteration is likely due to the deeper layer). Once you've located most of the last layer(s?) weights, you can try to solve for them. With a hundreds of billions weights model, the last layers will likely be so huge that it's probably unfeasible technically, but it's theoretically possible.

by jorisw11 hours ago|

prev|

[-]

No, you'd need to have the model on your filesystem for direct access, and then the architecture would need to be the same.

by parineum12 hours ago|

prev|

[-]

If you have access to the weights, you can just use them as is...

by HarHarVeryFunny7 hours ago|

parent|

[-]

Anthropic are not saying they have been hacked - they are saying that Alibaba have been sending lot of requests to their servers.

by antonvs13 hours ago|

prev|

[-]

You can do things like that - one example is averaging weights between related models - but not with Anthropic's models, because outsiders don't have access to the weights.

by fulafel12 hours ago|

parent|

[-]

Weights are just data a server, so we don't know outsiders have access (either via breakin or arrangement).

by antonvs2 hours ago|

parent|

[-]

Yes, obviously. That's not the point.