undefined

points

[-]

Great summary. The fact that the auto encoding task is not grounded in thoughts, and their initial training on guessed internal thoughts, raise serious concerns on faithfulness. Feels like they might get better results by just training a supervised model on activations and "internal thoughts" measured by some different behavioral way.

by programjames15 hours ago|

prev|

[-]

Don't they add a KL loss term to the frozen model's outputs?

by chrisweekly9 hours ago|

prev|

[-]

"deserves the semantic meaning"

you meant "preserves...", right?