There is probably a whole testing workflow at AI companies to tweak each new model until it "looks" acceptable.
But they still don't understand what they are doing. This is purely empirical.
That Nerdy personality prompt made me gag. As a card-carrying Nerd, I feel offended