undefined

points

by omega39 hours ago |

comments

by CharlesW9 hours ago|

[-]

> Commercial inference providers serve Chinese models of comparable quality…

"Comparable" is doing some heavy lifting there. Comparable to Anthropic models in 1H'25, maybe.

by omega39 hours ago|

parent|

[-]

Benchmarks suggests they are comparable: https://artificialanalysis.ai/?models=claude-opus-4-6-adapti...

But let's say for the sake of discussion Opus is much better - still doesn't justify the price disparity especially when considering that other models are provided by commercial inference providers and anthropics is inhouse.

by cbg08 hours ago|

parent|

[-]

Try doing real work with them, it's night and day difference especially for systems programming. The non-frontier models to a lot of benchmaxxing to look good.

by xienze9 hours ago|

parent|

prev|

[-]

> Benchmarks suggests they are comparable

The problem here is people think AI benchmarks are analogous to say, CPU performance benchmarks. They're not:

* You can't control all the variables, only one (the prompt).

* The outputs, BY DESIGN, can fluctuate wildly for no apparent reason (i.e., first run, utter failure, second run, success).

* The biggest point, once a benchmark is known, future iterations of the model will be trained on it.

Trying to objectively measure model performance is a fool's errand.