undefined

points

[-]

Well, if you are making side-hustle money now using online models that, critically, you could also run at home, then it sounds like it’s just a matter of numbers. Oh and, unless you spend a lot more than 5k, your local model will still be slower than the online model. What’s your estimated ROI?

Assuming that’s not true based on your phrasing, you’d be shooting yourself in the foot. Start using online models with the same quant at least benchmark as what you could run at home. Prepare for the at home model to be slower.

by dominotw4 hours ago|

parent|

[-]

no one is making money side-hustling ai models. This is like reddit wet dream. get real, dont get scammed by ppl selling you these dreams.

by cpburns20092 hours ago|

prev|

[-]

Mac, DGX Spark, and a Framework Desktop / Ryzen AI Max 395 (ie Strix Halo) will not give you great performance running LLMs. One benefit of the Spark over the others is you can easily link up to 4 of them. Only MoE (sparse) models will be usable. Even if you can run some massive models, they will crawl. You're better off running one or more GPU cards.

by ericd6 hours ago|

prev|

[-]

You probably want to try renting some time on a dedicated box with roughly the specs you’re considering and running the open models for a bit to see if you would actually use them before dropping a lot on local hardware. A 128 gig MacBook Pro isn’t going to get you an amazing model, and certainly not amazing speed. GLM 5.2 wants something like 350+ gigs at fp4 iirc.

by zackify4 hours ago|

parent|

[-]

I ran glm 5.2 on rented 8x h200 it could only do 2x concurrency at a cost of $40 an hour. It felt great but dang I wish it was cheaper... It needs 750 at fp8

by zackangelo2 hours ago|

parent|

[-]

what was the concurrency limitation? that node should be able to support a lot more

by traceroute665 hours ago|

parent|

prev|

[-]

> You probably want to try renting some time on a dedicated box with roughly the specs you’re considering and running the open models

You don't even need to go that far. For example, with Exoscale Dedicated Inference[1] you just point it at the Hugging Face for the model and quantisation you want to test and it automagically spits out an OpenAI-compatible API endpoint.

[1] https://www.exoscale.com/ai-cloud-infrastructure/dedicated-i...

(I have no relationship with Exoscale, this particular product just crossed my radar recently)

by hgoel5 hours ago|

parent|

[-]

I think they're just suggesting renting as a way to test that the hardware they're considering purchasing would actually be able to do what they need.

by traceroute664 hours ago|

parent|

[-]

> I think they're just suggesting renting as a way to test

Well, yes, I understood that.

Which is why I started with the words "You don't even need to go that far.".

To re-phrase what I said in clearer terms:

Instead of renting an instance, then messing around with configuring Linux and whatever via SSH or Ansible or whatever. Just point a Hugging Face link at this magic service and get a ready-to-go API back. Enabling you to test your desired model spec with minimum fuss.

Ultimately the guy wants his own hardware. So why waste time messing around with someone else's VM if you just want to test a specific model spec. That is the TL;DR.