undefined

points

by edg500015 hours ago |

comments

by data-ottawa15 hours ago|

[-]

Well I am definitely not using the models that I'm not able to access.

So now the question is whether the capabilities of other models are worth their far cheaper token prices.

Plus, are we at all confident Opus or GPT 5.5 aren't about to get shut off?

by bean46912 hours ago|

prev|

[-]

Not all people need the SOTA. Also, many take into consideration speed, token / plan cost and many other factors when choosing a model

by ignoramous11 hours ago|

prev|

[-]

> Nothing has reached Opus and GPT5 levels in my personal experience

You mean, GPT 5.5 xhigh and Claude Opus 4.8 max? At least the benchmarks / public evals / rankings show some of the new coding models (ex: Qwen 3.7 Max & Mimo v2.5 Pro) are Opus 4.7 & GPT 5.4 level (but 3x to 5x cheaper): https://artificialanalysis.ai/leaderboards/models / https://gertlabs.com/rankings Personally speaking, in the past 1mo or so, I haven't missed GPT 5.4 / Opus 4.7 after moving to Qwen 3.7 / MiMo 2.5 / Kimi 2.6 et al.

by edg50009 hours ago|

parent|

[-]

That is very promising news. I will re-eval them all shortly. And you are suggesting that a higher reasoning budget can make up for weaker per-token performance? That is indeed worth evaluating.

Comparisons using the vendor-specific effort is apples and oranges. Ideally the evals would use a thinking token cap or something, so we can compare per-token performance. But eval is hard enough as it is.

by surgical_fire2 hours ago|

prev|

[-]

I have been using DeepSeek at home. I have access to Claude and ChatGPT at work.

I honestly think that DeepSeek is as good, and sometimes even better, than the competition.