upvote
Yea I am more or less done with these big providers. I'm running local primarily now. These constant screw ups, not caring about customers, political issues, it's just not worth it for me. I get some people are hooked on vibe coding but the latest wave of small models I'm good for my needs.
reply
What do you use now? How much ram do you have? I am increasingly thinking of doing that
reply
Well about 4 weeks ago I was mostly running small models. Some of my favorites were deepseek r1 8b and qwen 3.5 9b. Those are more or less good for boiler plate super fast responses(what I cared about most).

Now I am still trying out all the models that dropped this month. I am running qwen 3.6 35 a3b on a 16gb vram rtx 4060 ti.

I wish I sprung for a 24gb vram card but I never thought the price difference would matter. It seems like it does and I bet in the future there will be more models at this size because this is crazy.

It's not as good as opus if you are doing completely hands off programming but it's completely fine for me. I mostly use it for auto complete or templating a class. Other people are using it for agentic workflows with success.

Check out /r/localllama for more experiences. My set up is not the best but it is working for me and is saving me money.

reply
> My set up is not the best but it is working for me and is saving me money.

I've got a local setup too but unless you consider hardware zero cost, there is really no way to save money. The class of model you can run on <$5k of hardware is dirt cheap to run in the cloud (generating tokens 24/7 non-stop is a few dollars a day at most, possibly even less than the cost of electricity to do it at home).

reply
There's truth to that. But, I already had the card for other purposes. And I don't have to egress or ingress anything. I love having it all local to me. I also love how I can sell the card later. Funny thing, my GPU has gone up in price so I might even have made money
reply