upvote
I use it directly with Claude code [1]. Honestly, it just makes sense IMO to host your own model when you have your own company. You can try something like openrouter for now and then setup your own hardware. Since most of these models are MoE, you dont have to load everything in VRAM. A mixture of a 5090 + EPYC CPU + 256GB of DDR5 RAM can go a very long way. You can unload most of the expert layers onto CPU and leave the rest on GPU. As usual Unsloth has a great page about it [2]

[1] https://docs.z.ai/scenario-example/develop-tools/claude [2] https://docs.unsloth.ai/models/glm-4.6-how-to-run-locally

reply
Hope you‘ll share your story if you start. Love your book on langchain from iirc 2y ago, it got me going.
reply
> There is a recent Stanford study showing most US startups are using less expensive Chinese models

link ?

reply
Idk if this is the reference but it’s in the same direction:

„ These days, when entrepreneurs pitch at Andreessen Horowitz (a16z), a major Silicon Valley venture-capital firm, there’s a high chance their startups are running on Chinese models. “I’d say there’s an 80% chance they’re using a Chinese open-source model,” notes Martin Casado, a partner at a16z.“ —- https://ixbroker.com/blog/china-is-quietly-overtaking-americ...

reply