That said, most of this repo is solving the wrong problem. "Answer before reasoning" actively hurts quality, and the benchmark is basically meaningless. But the anti-sycophancy rules should just be default. "Great Question!" has never really helped anyone debug anything.
So the market kind of works in this instance.