upvote
It's odd no manufacturer jumped on this wagon to offer a competitive alternative.
reply
Is there even enough market for this?

These models are dumber and slower than API SoTA models and will always be.

My time and sanity is much more expensive than insurance against any risk of sending my garbage code to companies worth hundreds of billions of dollars.

For most, it's a downgrade to use local models in multiple fronts: total cost of ownership, software maintenance, electricity bill, losing performance on the machine doing the inference, having to deal with more hallucinations/bugs/lower quality code and slower iteration speed.

reply
Actually yes. For example, I run local models for ingested documents, summaries, etc. The local models are fine, and there is no need for me to pay for tokens. Performance is adequate for that purpose as well. There are many other cases where I run at scale, time is flexible so things can move slower, and I rather keep it all in house. I'm not even getting into areas where data cannot leave the premises for legal reasons. Right now I'm limited with GPUs mostly. But if that world of local models on Apple silicon is so "good", there is room to expand it to other fruits...
reply
> These models are dumber and slower than API SoTA models and will always be.

Sure but you're paying per-token costs on the SoTA models that are roughly an order of magnitude higher than third-party inference on the locally available models. So when you account for per-token cost, the math skews the other way.

reply