upvote
> A perturbation of the the activations that made Claude identify as the Golden Gate Bridge.

Great, now we've got digital Salvia

reply
Golden Gate Claude was two years ago and it's surprising there hasn't been as much research into targeted activations since.
reply
There’s been some, but naive activation steering makes models dumber pretty reliably and training an SAE is a pretty heavy lift.
reply