undefined

points

[-]

Pretty easy to test, I’d imagine, on a local LLM that exposes internals.

I’d suspect that the signals for enjoyment being injected in would lead towards not necessarily better but “different” solutions.

Right now I’m thinking of it in terms of increasing the chances that the LLM will decide to invest further effort in any given task.

Performance enhancement through emotional steering definitely seems in the cards, but it might show up mostly through reducing emotionally-induced error categories rather than generic “higher benchmark performance”.

If someone came along and pissed you off while you were working, you’d react differently than if someone came along and encouraged you while you were working, right?

by Tossrock52 minutes ago|

parent|

[-]

If you think training a sparse autoencoder to extract concept vectors that are usable as steering injections into a modern LLM is pretty easy, you should probably go work for Anthropic's mech interp team ;)