upvote
They're using the truncated SVD, not the full variant, that's computationally cheaper.
reply
Fair points, especially on GSM8K saturation and Qwen possibly already sitting close to the solution. That said, even if this is mostly "last-mile alignment", the fact that it can be done with such a tiny signal is still interesting, it suggests the gap between capability and behavior might be much smaller (and cheaper to bridge) than we assume.
reply
Yeah, my big problem with the paper is it just might be an artifact of qwen's training process.
reply
In all fairness most of the unique stuff I can do is probably an artifact of my training process, so it seems unfair to deny an LLM the same accomodation.
reply