upvote
> Commercial inference providers serve Chinese models of comparable quality…

"Comparable" is doing some heavy lifting there. Comparable to Anthropic models in 1H'25, maybe.

reply
Benchmarks suggests they are comparable: https://artificialanalysis.ai/?models=claude-opus-4-6-adapti...

But let's say for the sake of discussion Opus is much better - still doesn't justify the price disparity especially when considering that other models are provided by commercial inference providers and anthropics is inhouse.

reply
Try doing real work with them, it's night and day difference especially for systems programming. The non-frontier models to a lot of benchmaxxing to look good.
reply
> Benchmarks suggests they are comparable

The problem here is people think AI benchmarks are analogous to say, CPU performance benchmarks. They're not:

* You can't control all the variables, only one (the prompt).

* The outputs, BY DESIGN, can fluctuate wildly for no apparent reason (i.e., first run, utter failure, second run, success).

* The biggest point, once a benchmark is known, future iterations of the model will be trained on it.

Trying to objectively measure model performance is a fool's errand.

reply