https://artificialanalysis.ai/?intelligence-efficiency=intel...
Looking at their cost breakdown, while input cost rose by $800, output cost dropped by $1400. Granted whether output offsets input will be very use-case dependent, and I imagine the delta is a lot closer at lower effort levels.
Tokenizer changes are one piece to understand for sure, but as you say, you need to evaluate $/task not $/token or #tokens/task alone.
Though, from my limited testing, the new model is far more token hungry overall
I’ve noticed 4.7 cycling a lot more on basic tasks. Though, it also seems a bit better at holding long running context.