undefined

points

[-]

It's hard to tell for sure because the local inference engines/frameworks we have today are not really that capable. We have barely started exploring the implications of SSD offload, saving KV-caches to storage for reuse, setting up distributed inference in multi-GPU setups or over the network, making use of specialty hardware such as NPUs etc. All of these can reuse fairly ordinary, run-of-the-mill hardware.

by DeathArrow7 hours ago|

prev|

[-]

Since you need at least a few of H100 class hardware, I guess you need at least few tens of coders to justify the costs.

by pitched2 minutes ago|

parent|

[-]

I see the 512GB Mac Studios aren’t for sale anymore but that was a much cheaper path