They said the 2.5X offering is what they've been using internally. Now they're offering via the API: https://x.com/claudeai/status/2020207322124132504
LLM APIs are tuned to handle a lot of parallel requests. In short, the overall token throughput is higher, but the individual requests are processed more slowly.
The scaling curves aren't that extreme, though. I doubt they could tune the knobs to get individual requests coming through at 10X the normal rate.
This likely comes from having some servers tuned for higher individual request throughput, at the expense of overall token throughput. It's possible that it's on some newer generation serving hardware, too.
This is such bizarre magical thinking, borderline conspiratorial.
There is no reason to believe any of the big AI players are serving anything less than the best trade off of stability and speed that they can possibly muster, especially when their cost ratios are so bad.
Just because you can't afford to 10x all your customers' inference doesn't mean you can't afford to 10x your inhouse inference.
And 2.5x is from Anthropic's latest offering. But it costs you 6x normal API pricing.
> codex-5.2 is really amazing but using it from my personal and not work account over the weekend taught me some user empathy lol it’s a bit slow
These companies aren't operating in a vacuum. Most of their users could change providers quickly if they started degrading their service.
The reason people don't jump to your conclusion here (and why you get downvoted) is that for anyone familiar with how this is orchestrated on the backend it's obvious that they don't need to do artificial slowdowns.
Also, I just pointed out at the business issue, just raising a point which was not raised here. Just want people to be more cautious