undefined

points

[-]

> For instance, company buys an RX6000 setup for say $15k total. They could use this for handling data heavy sifting that would otherwise be a lot of Claude tokens.

Considering they might be spending thousands per month on API costs already, dropping 15K to save on one process might not be bad. On the other hand, also an opportunity to sell GLM 5.2 inference at near cost to other companies for less than whatever Claude costs. In theory it costs anywhere from $0.51 to less than $2 an hour to run it and use it 24/7 that's still wildly cheaper than calling Opus which doesn't bill per hour, but per million tokens, drastically higher. Hell, you could probably bill at $5 per GPU hour and still be cheaper. Whether you're looking to self-host or sell hosting for it, it looks way cheaper regardless. I think most decent open models will continue to fit in at least 32GB of VRAM so a 6000 Pro GPU is more than enough. alternatively, even on a 5090 you can get a reasonable amount of inference for way less than paying for Opus, Qwen would be your friend there though.