https://deepinfra.com/zai-org/GLM-5.1
Looks like fp4 quantization now though? Last week was showing fp8. Hm..
I also regularly experience Deepinfra slow to an absolute crawl - I've actually gotten more consistent performance from Z.ai.
I really liked Deepinfra but something doesn't seem right over there at the moment.
It's frankly a bummer that there's not seemingly a better serving option for GLM 5.1 than z.AI, who seems to have reliability and cost issues.
CC has a limited capacity for Opus, but fairly good for Sonnet. For Codex, never had issues about hitting my limits and I'm only a pro user.
It's not crushing Opus 4.5 in real-life use for me, but it's close enough to be near interchangeable with Sonnet for me for a lot of tasks, though some of the "savings" are eaten up by seemingly using more tokens for similar complexity tasks (I don't have enough data yet, but I've pushed ~500m tokens through it so far.
They have difficulty supplying their users with capacity, but in an email they pointed out that they are aware of it. During peak hours, I experience degraded performance. But I am on their lowest tier subscription, so I understand if my demand is not prioritized during those hours.
https://arena.ai/leaderboard/text?viewBy=plot&license=open-s...
I did give it one task which was more complex and I was quite impressed by. I had a local setup with Tiltdev, K3S and a pnpm monorepo which was failing to run the web application dev server; GLM correctly figured out that it was a container image build cache issue after inspecting the containers etc and corrected the Tiltfile and build setup.
For more complicated stuff, like queries or data comparison, Codex seems always behind for me.
OpenAI on the other hand has different models optimized for coding, GPT-x-codex, Anthropic doesnt have this distinction