Hacker News
new
past
comments
ask
show
jobs
points
by
Lerc
17 hours ago
|
comments
by
pantalaimon
16 hours ago
|
next
[-]
> A perturbation of the the activations that made Claude identify as the Golden Gate Bridge.
Great, now we've got digital Salvia
reply
by
minimaxir
16 hours ago
|
prev
|
[-]
Golden Gate Claude was two years ago and it's surprising there hasn't been as much research into targeted activations since.
reply
by
landl0rd
12 hours ago
|
parent
|
[-]
There’s been some, but naive activation steering makes models dumber pretty reliably and training an SAE is a pretty heavy lift.
reply