undefined

points

[-]

> it will be really slow (multiple seconds per token!)

This is fun for proving that it can be done, but that's 100X slower than hosted models and 1000X slower than GPT-Codex-Spark.

That's like going from real time conversation to e-mailing someone who only checks their inbox twice a day if you're lucky.

by HPsquared8 hours ago|

prev|

[-]

At a certain point the energy starts to cost more than renting some GPUs.

by vardalab6 hours ago|

parent|

[-]

Yeah, that is hard to argue with because I just go to OpenRouter and play around with a lot of models before I decide which ones I like. But there's something special about running it locally in your basement

by fc417fc8023 hours ago|

parent|

prev|

[-]

Aren't decent GPU boxes in excess of $5 per hour? At $0.20 per kWhr (which is on the high side in the US) running a 1 kW workstation 24/7 would work out to the same price as 1 hour of GPU time.

The issue you'll actually run into is that most residential housing isn't wired for more than ~2kW per room.