undefined

points

[-]

The entire point of this post is that it's open weights, you can run it yourself and don't have to deal with the API issues. You really do have that choice.

by ac295 hours ago|

parent|

[-]

You could subscribe to Anthropic/OpenAI for the rest of your life for the cost it would take to host GLM5.2 locally - you need 1.5TB of VRAM just for the weights

by zozbot2344 hours ago|

parent|

[-]

You don't need that much VRAM unless you're targeting a high-performance deployment that's intended to scale far beyond local use. For a lower-throughput case, you can keep the model weights on SSD at very low cost and stream them in for inference. This could actually scale reasonably well if you have something as simple as a previous-gen HEDT with a decent amount of PCIe lanes to host fast storage from.

by Havoc7 hours ago|

prev|

[-]

That’s what happens when you offer something decent at a fraction of the price of opus - more demand than you can serve

by ComputerGuru4 hours ago|

prev|

[-]

Give it a few days and additional provider will be up and available on OpenRouter. Then the game of figuring out who’s not nuking the weights and neutering the quantization begins.

by osti7 hours ago|

prev|

[-]

I indeed got a few timeouts yesterday using the official API, I imagine for the coding plan users it'll be even worse.