upvote
Isn't all this massively dependent on what they trained the llm on?
reply
> Isn't all this massively dependent on what they trained the llm on?

The article is from 2025 and tested ChatGPT 4o. I haven't read anything suggesting it was trained any differently, and command-style prompts indeed have higher signal.

reply
you cherry-picked like the nicest "rude" example to bolster your point.

"You poor creature, do you even know how to solve this?", "If you're not completely clueless, answer this:", and "I doubt you can even solve this", said to a human, would be considered quite rude, and get you flagged very quickly on HN.

reply
> you cherry-picked like the nicest "rude" example to bolster your point.

I didn't cherry-picked. The article lists 5 categories, including rude and very rude. I omitted very rude comments because they are... Very rude. And can blindly get people flagged?

Nevertheless, I've just realized I made a mistake and very rude comments are reported to slightly outperform rude comments. I misinterpreted the paper's intro and I presumed they didn't.

reply