undefined

upvote

points

by unrvl228 hours ago |

upvote

by stanac8 hours ago|

[-]

> Some are even offering API rates at 3x lower than the official ZAI api rates

Looking at openrouter [1], some of the cheaper offerings are for quantized models. Not sure how much intelligence is lost in quantization. And they are not 3 times cheaper. Where did you find 3x lower prices for APIs? I am considering skipping open router and using them directly for that price.

edit:

I see, croft [2] 8bit for $0.50/$0.08/$2.20

[1]: https://openrouter.ai/z-ai/glm-5.2

[2]: https://ai.nahcrof.com/pricing

reply

upvote

by scrlk7 hours ago|

[-]

IME, unquantised -> FP8 is pretty much lossless. What matters more is having an unquantized KV cache - using an FP8 KV cache can result in a significant drop in quality.

reply

upvote

by johnnyApplePRNG3 hours ago|

[-]

>unquantised -> FP8 is pretty much lossless

Claude Shannon is rolling in his grave.

reply

upvote

by gpm2 minutes ago|

[-]

I don't know, sounds quite similar to his rate distortion theorem (analyzing minimum number of bits/symbol you need to stay under some fixed amount of distortion). I.e. lossy compression with a maximum amount of loss. I.e. "pretty much lossless" compression.

https://en.wikipedia.org/wiki/Rate%E2%80%93distortion_theory (

reply

upvote

by ComputerGuru4 hours ago|

[-]

Do infra providers reveal that level of implementation detail?

reply

upvote

by scrlk4 hours ago|

[-]

I've seen a few articles from providers talking about KV cache quantisation, but it's not something they explicitly point out like they do with weights.

So you could end up paying more for unquantised weights, only to get silently hit with a quantised KV cache...

reply

upvote

by benjiro297 hours ago|

[-]

Neuralwatt ... When you reverse calculate the actual energy usage / price on a token basis, the gap is large.

I do not have GLM 5.2 numbers because the whole default max setting is overkill. But GLM 5.1 numbers had it at 12x cheaper then API rates. And about 2.5x more tokens vs zai their own subscription service.

Yes, its FP8 but lets be honest, do we know for sure that even zai runs at FP16? I learned a long time ago with Claude and Codex how much cheating happens on model levels, even from the big boys.

reply

upvote

by spelk3 hours ago|

[-]

Please correct me if you have contradicting data but: Neuralwatt's price per token vs price for energy comparison doesn't seem to take into account the cost savings from cache hits that other providers offer on pure token rates. The comparison seems to assume every input token is a cache miss.

On top of that, the cloud offering doesn't seem that well-run, they randomly blocked a colleague's API key for a couple days without any heads up, had a weird rate limiting bug and they have been deprecating models without redirects with very short notice, all while taking weeks to onboard new models. I assume some of these problems would be addressed if we had an SLA/enterprise contract.

It's a promising idea though. They offer a $5 trial credit (with an aggressive rate limit) though so no harm in trying it out.

reply

upvote

by CuriouslyC8 hours ago|

[-]

Be careful about unofficial providers, a lot of them misconfigure models or stealth quantize them. For a while the difference between Kimi on the official API and most third party providers was 20-40%.

reply

upvote

by thehamkercat6 hours ago|

[-]

Kimi K2 had a vendor verifier: https://github.com/MoonshotAI/K2-Vendor-Verifier

(there's a table which shows comparison between vendors)

Also, it seems there's a general one as well (for all kimi models?): https://github.com/MoonshotAI/Kimi-Vendor-Verifier

reply

upvote

by cedws8 hours ago|

[-]

OpenRouter should be penalising or banning for this.

reply

upvote

by kilroy1237 hours ago|

[-]

This is my biggest complaint about OpenRouter and I'm a fan. Might be pretty tough at scale?

reply

upvote

by orbital-decay5 hours ago|

[-]

They have an "exacto" category with providers they supposedly verified

reply

upvote

by ComputerGuru4 hours ago|

[-]

That’s only for tool use.

reply

upvote

by alecco6 hours ago|

[-]

Would that align with their VC-backed incentives?

reply

upvote

by mrngld45 minutes ago|

[-]

If your users can't trust your product then I'd say that'd be a pretty strong incentive?

reply

upvote

by unrvl228 hours ago|

[-]

the 2 I mentioned both have a fairly large following, who run benchmarks and absolutely will spot issues.

reply

upvote

by Schiendelman8 hours ago|

[-]

To answer the question in your first sentence - because it's VERY computationally (ha) expensive as a human being to keep up with all the options. It's also very hard to figure out how to run a model like this. There's no installer. If you really really care, which 99% of people do not, you have to google a guide, and then find out it's out of date...

I've tried a number of these, and the learning curve is very steep compared to "install Claude Code and pay $100/mo". There is no way saving me $50/month matters compared to figuring that out.

reply

upvote

by andai8 hours ago|

[-]

But it just works with Claude Code? They have a guide on their website.

https://docs.z.ai/devpack/tool/claude

Here's my setup. I add this to my .bashrc

export ZAI_API_KEY="your_key_here"

alias claudez='ANTHROPIC_AUTH_TOKEN="$ZAI_API_KEY" ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic" ANTHROPIC_DEFAULT_OPUS_MODEL="glm-5.2[1m]" ANTHROPIC_DEFAULT_SONNET_MODEL="glm-4.7" ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-4.7" claude'

Then I just run claudez

pro tip the same thing works with deepseek https://api-docs.deepseek.com/guides/anthropic_api

Even more pro tip: Claude Code can set this up for you haha

reply

upvote

by Schiendelman8 hours ago|

[-]

Sure, I'm not saying I, a software engineer, cannot do this. I'm saying it's significant onboarding friction.

Unless this were a massive differentiator, people aren't going to be "talking about it" the way GP suggests!

reply

upvote

by fc417fc8027 hours ago|

[-]

You're seriously suggesting that setting up opencode or tweaking your claude code config or etc is too much trouble to be worth saving $50 /mo? That's absurd. Doubly so when the audience in question is already using LLMs so ... just ask your existing LLM for help if it seems daunting.

reply

upvote

by Schiendelman7 hours ago|

[-]

I'm not just suggesting that, I'm trying to be crystal clear: it's a gap that probably cuts TAM by 95% or more. Most LLM users are not software engineers. Even those that are don't care enough to muck with their settings to try out a model. Keep in mind I'm not answering the question "Is this hard to install?" - I'm answering the question "Why aren't people talking about this?"

reply

upvote

by donohoe6 hours ago|

[-]

I would broadly agree with this (based on years of dealing directly with user-facing UX and setup steps). Small hurdles, even easy ones, create larger barriers to adoption then you’d think.

reply

upvote

by fc417fc8027 hours ago|

[-]

Doesn't pass the sniff test. Casuals messing around already go to far more trouble to set up openclaw or comfyui or what have you.

reply

upvote

by Schiendelman7 hours ago|

[-]

What percentage of "casuals"? ;)

reply

upvote

by neonstatic6 hours ago|

[-]

"Casuals" just use the web interface from the provider, which Z.ai also has

reply

upvote

by ramraj075 hours ago|

[-]

Thats not absurd. Do you know what software engineers make? Do you know what a Starbucks coffee costs? 50 bucks is nothing for someone in that life.

reply

upvote

by cromka4 hours ago|

[-]

> it's significant onboarding friction.

It's crazy that apparently writing software without knowing how to edit a single config file is normal now.

reply

upvote

by egeozcan1 hours ago|

[-]

For me it's about tolerance. When I was 13, I could and would customize everything, so much that the computer repair shop told my father that their son "likely is a hacker or something".

At 40, I could easily configure claude code to use another model, even if there weren't any official guides with a bit of MITM fun, but I don't want to invest my attention / heavily use something that will most likely break in the near future.

reply

upvote

by johnnyApplePRNG3 hours ago|

[-]

It's crazy that apparently doing math without knowing how to do long division by hand is normal now.

reply

upvote

by phainopepla22 hours ago|

[-]

Absolutely ludicrous comparison

reply

upvote

by bityard3 hours ago|

[-]

The real question is: should the file be edited in emacs or vim?

reply

upvote

by computerex2 hours ago|

[-]

Not really, you can literally have Claude set it up for you.

reply

upvote

by skeledrew6 hours ago|

[-]

The friction is near 0 when you can ask another LLM to set it up for you.

reply

upvote

by Schiendelman5 hours ago|

[-]

Here are a few frictions I see that reduce reach, in order:

1) You haven't even heard of it.

2) You have to know to look for both GLM and Z.ai. These are usually in the same article when reporting about GLM is written, at least.

3) You have to understand there could be a benefit in trying it; you have to want to try it for some reason. Their own blog post puts it below Opus 4.8 in each of the three benchmarks they used.

4) You have to figure out the pricing, which isn't obviously in the blog post...

5) When I first went to Z.ai, I got an error popup (not logged in): "You do not have permission to access this resource. Please contact your administrator for assistance." I am using a personal computer...

6) When I typed something in the resultant field and pressed enter, I got "Clear Current Chat? To start a new chat, your current conversation will be discarded. Sign in to save chats"

I think today's article helped with 1 and 2, which helps their top of funnel. But they're fighting a big uphill battle.

reply

upvote

by chen669967 hours ago|

[-]

[flagged]

reply

upvote

by chillfox5 hours ago|

[-]

install opencode, then either pay $10 for their plan, or add an openrouter api key.

reply

upvote

by re-thc5 hours ago|

[-]

> There's no installer.

There's ZCode (https://zcode.z.ai). Which is like the Codex App.

That's as "easy" as it is for non-devs that you're complaining about.

reply

upvote

by qingcharles3 hours ago|

[-]

How does it compare to OpenCode? I already have too many LLM CLIs installed :(

reply

upvote

by Schiendelman4 hours ago|

[-]

I'm not complaining about anything. I'm answering a question.

reply

upvote

by gerryf25 hours ago|

[-]

I agree with this.

I'd pay for an out of the box solution. i.e. an Installer with updates

reply

upvote

by cedws8 hours ago|

[-]

In my org everyone is extremely Claude-pilled to the point you’d think it’s the only LLM that exists, purely because it caters to non-engineers within enterprises.

reply

upvote

by embedding-shape8 hours ago|

[-]

> Why aren't more people talking about this?

Wasn't this released like 2 days ago? Everyone is still evaluating and playing around with it, things like the submission is just starting to come out. Give it some days at least before jumping to conclusions, ideally weeks.

reply

upvote

by unrvl228 hours ago|

[-]

I cancelled my claude sub after realizing I can burn 300m tokens a day of this quality, for $50 a month.

reply

upvote

by spelk3 hours ago|

[-]

Which coding plan are you using? How are you finding it?

reply

upvote

by knollimar6 hours ago|

[-]

Isn't it closer to sonnet?

reply

upvote

by redox996 hours ago|

[-]

Definitely opus level for coding.

reply

upvote

by smith70186 hours ago|

[-]

Do you have benchmarks or at least anecdotes to back that up? I'm not arguing with you; I would just love to see some proof that open models are getting as good as Anthropic's models.

reply

upvote

by redox995 hours ago|

[-]

I've been running some test prompts comparing frontier models for webdev, particularly pretty visualizations, physics / orbital simulations, etc.

Do note that GLM is not multi modal, which can be a deal breaker. And these open models are not good outside coding.

reply

upvote

by unrvl225 hours ago|

[-]

look at benchmarks, use the model yourself. Im usually first to call BS on every chinese model that says they are as good as Opus. this is finally the first one that actually is. It is a massive jump from every other previous chinese model.

reply

upvote

by smith70185 hours ago|

[-]

"use the model yourself"

I wish I had the time to set it up and work on side projects but unfortunately life and work have been crazy (as I'm sure many here feel). That's why I asked for anecdotes about it.

reply

upvote

by knollimar2 hours ago|

[-]

Oic I misremembered OAI scores, I thought Sonnet had 51

reply

upvote

by Hamuko8 hours ago|

[-]

I’m not that interested in models that I can’t run on my desktop for ~0€, which is my AI budget.

reply

upvote

by andai8 hours ago|

[-]

Electricity cost seems to be about $30/month for a 32B model on a GPU. It's probably better on Apple hardware.

https://github.com/QuantiusBenignus/Zshelf/discussions/2

Not accounting for hardware, of course :)

reply

upvote

by Hamuko7 hours ago|

[-]

My Mac Studio uses about 60–80 watts whenever I’m running a model (as measured by the system metrics), so it’s less than 2 kWh/day at full blast. Electricity is like 0.125 €/kWh, so that 24-hour period would be <0.25 €.

Not accounting hardware in my costs, since I didn’t buy my hardware for running models. Running models is just something it can do in addition to what I got it for.

reply

upvote

by NorwegianDude7 hours ago|

[-]

The price, processed tokens, and output can be anything, it just depends on what GPU it is.

Nvidia GPUs are much more efficient than Apple hardware for inference(and training).

reply

upvote

by igravious8 hours ago|

[-]

Cool beans. You're not the target audience then.

reply

upvote

by Hamuko8 hours ago|

[-]

Did I claim I was? I just said why I and people like me are not talking about it.

reply

upvote

by simianwords8 hours ago|

[-]

and he said its cool

reply

upvote

by anuramat8 hours ago|

[-]

> unlimited tokens for $50 a month

link?

> Why

imho everything but opus produces unusable code (fable was even better...), eg gpt5.5 seems to write the absolute worst code that still technically solves the problem; tbh I'd be totally willing to trade "raw intelligence" for "code taste"

more labs need to figure out whatever anthropic did to destroy everybody else on frontiercode bench

reply

upvote

by CuriouslyC5 hours ago|

[-]

Opus has the nickname "Slopus" in a lot of circles for a reason. It can write nice code in isolation, but the way it organizes that code and its rigor in addressing edge cases/making sure things are robust leave a lot to be desired. Opus is particularly famous for having a real problem reinventing stuff that already existed in the codebase because it wanted to get to work before exploring sufficiently.

reply

upvote

by anuramat2 hours ago|

[-]

what you're describing doesn't sound like such a big deal -- it's (A) obvious during review, (B) easy to fix in a single prompt, (C) simple enough to fix manually, (D) can be mitigated with tokenmaxxing (agent review passes, prompting, subagents, etc)

regarding edge cases -- less is more in my experience, as removing is harder than adding

reply