undefined

points

[-]

Framework Desktop is the closest one with the MAX 385/395 chip. It's mostly about the memory being fast enough rather than just CPU/GPU oomph.

The 64GB model is 2240€ base and the 128GB is 3069€ base + all the stuff you need to add to make it an actual computer.

As a comparison the 64GB Mac Mini is 2499€ here and a 128GB Mac Studio is 4274€.

by eigenspace11 hours ago|

parent|

[-]

Note though that that a MAX 395 has half the memory bandwidth of a M4 Max chip, and the memory bandwidth is going to be the biggest limiting factor, so you'll likely be getting around half the tokens/second with that Framework Desktop.

by theshrike799 hours ago|

parent|

[-]

There's a reason why it's cheaper than the Mac equivalent and it's not all because of Apple's premium pricing =)

But it's still the easiest and cleanest way to get decent local AI speeds on a non-Mac.

by sgt13 hours ago|

prev|

[-]

Not even close. If you want to run this on PC's you need to get a GPU like 5090 but that's still not the same cost per token, and it will be less reliable and use a lot more power. Right now the Apple Silicon machines are the most cost effective per token and per watt.

by harel12 hours ago|

parent|

[-]

It's odd no manufacturer jumped on this wagon to offer a competitive alternative.

by hu312 hours ago|

parent|

[-]

Is there even enough market for this?

These models are dumber and slower than API SoTA models and will always be.

My time and sanity is much more expensive than insurance against any risk of sending my garbage code to companies worth hundreds of billions of dollars.

For most, it's a downgrade to use local models in multiple fronts: total cost of ownership, software maintenance, electricity bill, losing performance on the machine doing the inference, having to deal with more hallucinations/bugs/lower quality code and slower iteration speed.

by harel10 hours ago|

parent|

[-]

Actually yes. For example, I run local models for ingested documents, summaries, etc. The local models are fine, and there is no need for me to pay for tokens. Performance is adequate for that purpose as well. There are many other cases where I run at scale, time is flexible so things can move slower, and I rather keep it all in house. I'm not even getting into areas where data cannot leave the premises for legal reasons. Right now I'm limited with GPUs mostly. But if that world of local models on Apple silicon is so "good", there is room to expand it to other fruits...

by zozbot23411 hours ago|

parent|

prev|

[-]

> These models are dumber and slower than API SoTA models and will always be.

Sure but you're paying per-token costs on the SoTA models that are roughly an order of magnitude higher than third-party inference on the locally available models. So when you account for per-token cost, the math skews the other way.

by dabinat12 hours ago|

prev|

[-]

Intel’s doing interesting things with their Arc GPUs. They’re offering GPUs that aren’t super fast for gaming but are relatively low power and have a boatload of VRAM. The new B70 is half the retail price of a 5090 (probably more like 1/3rd or 1/4 of actual 5090 selling prices) but has the same amount of memory and half the TDP. So for the same price as a 5090 you could get several and use them together.

by rubymamis12 hours ago|

prev|

[-]

I wonder if the Snapdragon X Elite already caught up with the Apple's M series in that regard - does anybody know?