And I can't say that I won't switch to openrouter (even just for the same models) at some point.
But one of the things I have found about my own process learning is that some lessons only come to you when you make yourself available to them. And if that means doing things the difficult way, that is what you should do.
The rest of my life is ultra-frugal so I am relaxed about this.
Having spent a good weekend learning how to perform latent-steering through playing with pytorch and a local Gemma4 model, there is no way I could have groked any of that in the the way I did without hands on time.
This is on an M3 Max 36GB I've had for a couple of years. No further outlay needed.
I don't know if it has changed my mind about a career change but as I am sure you can understand, I no longer feel like I am running away defeated.
My very best wishes to you :-)
The interesting question is whether that gap will narrow, and if so, how much, and on what timescale.
The exact answer to this question is not knowable, but if you are the kind of person who comes to a site called "hacker news", and you think there is a nonzero chance that the answer is that yes, the gap will narrow and this won't always be an expensive toy, then now seems like a pretty great time to get in the game and start exploring the capabilities.
(sarcasm, btw)
Over the long term it's always been better to buy than to rent, even if the renting option is technically more efficient on the GPUs, you don't have to pay some hosting providers profit margin.
And for users that aren't running multiple agents 24/7, you should be able to fit a good user:GPU ratio.
For example (and relevant to AI) I can generate electricity on my roof at $0.20-25/kWh, batteries included. In California the electric utility can’t offer it cheaper than $0.30-0.50/kWh. Therefore at scale, electricity is actually more expensive.
There are many such examples.
Right now, there is way more scale in centralized AI than there is at the edge. But that could flip. I'd still probably put the probability that it will under 50%. But I'd also put it above zero!
What makes you so certain that economies of scale won't work the opposite way you imagine? E.g., if model improvement tapers off, but RAM costs decline (hard to believe atm, but historically likely), then eventually everyone will be able to run SOTA models on their personal hardware.
Heck, even if model sizes simply grow more slowly than RAM costs decrease, the same would happen.
I do realize the cloud is just someone else’s computer right? Power goes in, tokens and heat come out - just in another place
That's never the point of keeping local alternatives though.
For me this dates all the way back to installing Slackware 1.0 (0.99pl12!) on an offline 486SX rather than just using the internet-connected workstations in the lab.
Here, I already had a Mac that was powerful enough to run a local LLM, so now I do, because I can.