upvote
There was an article on HN a few weeks ago where someone detailed how they managed to get an old datacenter GPU to run in their consumer PC, getting decent performance with qwen. He spent something like $200 on the GPU (second hand of course).

So yeah, I think models on local hardware will be quite common soon among the tech savvy (such as people creating software).

reply
Especially considering the millions of 2026-class data center GPUs that massively overinvested companies are currently buying, which will be obsolete in a few years.
reply
I think you are right when you factor in the much more efficient newer high end GPU. That is what may make the current GPu investments obsolete in 2-4 years.
reply
I think those are going to be run until they die. The capex vs opex is too high to obsolete them in a few years. They'll keep serving current gen LLMs for as long as they keep running.
reply
It won't make sense to run them after two years. The vendors will be limited on datacenter space, power and cooling, and there will be new hardware available that will run the same models at a fraction of the power.

A100 -> H100 was >3x tokens per joule, H100 -> B200 >10x. There are significant low-hanging fruit still available in architectural efficiency, and the vendors are chasing them.

This is the big risk for AI companies that I feel is not being sufficiently priced in. Almost none of the investments they are making are durable, the depreciation schedules for everything but the real estate should be less than 24 months. Until the hardware is stable enough that you only get double-digit % increases per generation, it should almost be counted as opex.

reply
They can also be used for other things than running the main frontier whatever model as well.

E.g. grok isn't truly multi-modal, it has a callable tool that is a separate VLM it invokes on image URLs or files (for a long time it was grok-1.5v, but I think they have upgraded now, it was pretty bad).

And then you have the small summarizer models for the CoT/thought traces, the guidable summarizer models for the standard browse tools, etc.

There's a ton of stuff that can use an aging GPU.

reply
Yes, sure, but not efficiently. Even Pops will not want to run four hair dyer GPUs 24-7 in the garage.
reply
H100 were released in Oct 2022. They are now more expensive than at release time.
reply
Indeed, and with some tinkering around the harness it can even punch way above its weight.
reply
> You need a system with either 32+ GB VRAM

I do hope you're right that it will get cheaper over time (it should), but right now 32GB of VRAM is not affordable to a lot of people. You're talking ~$4500 just for the GPU, or $800 ish used if you can find one.

reply
For inference you can split the 32GB between two 16GB cards. Two new 5060tis for ~€1000 in total is more than fine.

It's a tad less efficient and a bit more of a hassle, but still a good experience for only a fraction of the price.

reply
[dead]
reply
A Mac laptop can be had with 32GB of RAM for far less than $4500. Not sure if they actually need 32GB of discreet GPU RAM. My Mac laptop does run Qwen at a reasonable speed.
reply