upvote
That site is useless though because thinking tokens (and caching) and the efficiency thereof aren't accounted for. GLM5.2 is promoted by every 50 Cent Party the PLA can muster on the internet but it falls short because of its extremely verbose thinking. Anthropic models have the same problem but starting from a much higher base of real intelligence.

Which is exactly why every credible comparison now represents cost associated with completing a task, not arbitrary input and output token costs.

reply
> much higher base of real intelligence

Not sure how much "real intelligence" is to be found in Mythos & Sol, but at this point, ignoring the intelligence gap, I find it totally impressive that the likes of GLM, Kimi, Qwen, MiMo hold their own at 2x to 4x less cost, and work for my use case just the same.

reply