upvote
You said you need $80k in hardware for "good performance". I'm saying the local AI inference workflow will be a lot more flexible about performance than that, and can get away with something vastly cheaper and in line with what the user owns already.
reply
> paying $100/month

There will not ever be a monthly subscription for LLM tokens. The economics isn't there.

Local tokens will always be cheaper.

reply
What's the basis for saying local tokens will always be cheaper? As others have outlined, LLMs serving one user at a time are pretty expensive, but concurrent users become much more cost-effective (assuming there's enough RAM for the contexts). If "local" to you means ~10 hours daily use by a team of employees, the company still has to balance against cloud services that can amortize non-recurring costs over 24 hours of service per day.
reply
Why wouldn't a team of employees be able to run AI workloads 24/7? Not all workloads are time sensitive.
reply