> As long as that is allowed to happen
It won't be. Only we can provide that, and only for ourselves.
Having said that, I don’t think it’s all that tempting for companies at all, considering this whole market is developing rapidly and it’s nearly impossible to predict where we’ll be at in a year or two.
It's not like you'd lose capabilities, if anything this solution just gets better with time.
I think you’re better off renting GPU instances and running all the software on those. It’ll be cheaper than Anthropic and OpenRouter but slightly more expensive than electricity and depreciation of hardware.
The number of parameters needed for these open weight models has mostly stabilized so the actual memory requirements aren't likely to change all that much.
TPS = active weights in GB / your memory bandwidth.
That’s it for decode. That’s all.
You need something like 6x RTX Pro 6000 at $11800 each plus a nice server (add $10000) = $80800 and then quite a bit of electricity.
For who pays for it, obviously the employer would.
For "how did I arrive at this number" Ballpark estimate from what I know about part cost. Most of that money will go towards AI cards about $5k for the mb, cpu, power supply, etc. $45k would be for as much ram and as big/expensive nVidia cards as you can get your hands on. The B300 has 288GB of VRAM in it. Probably what you'd be after.