Right now the speed isn't good for GLM 5.2, Deepseek V4 Flash speed is okay for me (actually reading the output) and quite usable. See kyuz0's great recent video here: https://www.youtube.com/watch?v=PkKXm_mKCCM
With a bit more speed and model improvements, local AI becomes a reasonable practical thing! The biggest problem is all the tech companies making consumer hardware completely unaffordable, and I don't think this is accidental. Look at Micron's profits and share price lately...
I got my Strix machines for ~2k eur each, best computers this 90s kid has ever owned, but those days are gone :(
The device was not perfect by any means, but the ability to run fairly large models is some kind of magic.
It's not even worth it at that point.
You can get a used enterprise grade SXM baseboard with 4-8 V100/A100 GPUs off eBay at a similar price. That will even get you actual HMB ram and NVlink. Along with 10x the AI performance, assuming you don't care about your electricity bill of course.
I didn't get a Strix Halo laptop because it was the best bang per buck, I got it because it was an awesome machine that could do a little bit of everything, fit in a backpack and only needed 140W.
But noone should buy one at 7899, obviously. It was a tough sell for me at the old 2800 pricing.
The cheapest 128GB Macbook Pro here costs €7.949,00.
No doubt a better value than the HP, and will depreciate a lot less quickly, but just as expensive. Unfortunately, not being able to run Linux is a breaking point for me.
It isn't a problem for me, more amusing than anything else (I run in Low Power mode 90% of the time) but worth knowing for anyone thats thinking about pushing the hardware to its limit 24/7.
https://www.ebay.com/itm/157742745616
It's always a gamble buying used electric but per the description, fully decked out server with 256gb vram.
Though at 2U it is going to sound like a 747 taking off from your office.
https://www.ebay.com/itm/336632412718
Buy one and let me use it when you're sleeping.
Or maybe you're right, I originally remembered 2k as well. I wanted to wait for the AI Max 395+ upgrade of my laptop, and now it makes no sense to upgrade.
Only if you pay the Framework premium.
https://www.bosgamepc.com/products/bosgame-m5-ai-mini-deskto...
I don't have access to the USD price, but it's 2500€ (tax included), up from 1600€ in November when I ordered mine.
[Tangent: all my life I've been downvoted into a smoking hole in the ground, particularly on reddit r/hardware, for questioning the wisdom of laptops for high performance computing, including gaming. Everyone insists they need the mobility, and then just leave it plugged in the whole time, absolutely refusing to admit it's about aesthetic preference.]
Honestly I think this is just a bad time to be buying hardware - everything is marked up an insane amount that doesn't really make sense.
With the laptop you probably won't get silent operation at the peak 100-140w, i.e. you've now massively overpaid for lower performance.
The ones I've seen on aliexpress are from unknown Chinese vendors.
I was also a bit wary about Bosgame but TBH they've been great and the machine is rock solid, if a little noisier than and not as pretty as the FD. You can just buy from them directly and be fine, best computer deal out there by a mile.
It's like you are advocating for a public transport instead of a personal car but when questioned how to get to a place which is not erviced by a public network your solution is to rent a bus.
In 1-3 years the hardware crunch will be over, local distilled models will provide Opus 4.8 like intelligence, and the hardware will exist to provide usable performance.
You realize "tech companies" isn't a monolith? Micron charging inflated prices doesn't magically benefit OpenAI. The "high prices keep out competitors" theory doesn't make much sense either. It's like saying Dennys benefits from higher egg prices because it makes cooking eggs at home more expensive.
I think that realistically, companies compete against each other as individuals and compete against smaller companies and individuals acting more like cartels/monopolies, and that's what OP is referring to in terms of hardware purchasing/contacts/pricing. This also extends outside of tech to investing, so it's likely not just tech responsible for this.
It’s classic capex vs opex. I’d keep paying my openai subscription instead of dropping $3k to run a subpar model. If the thing costs $1k I would consider it.
Have they? Aren't they doing a massive datacenter build-out right now? Moreover the massive profits for Micron and Nvidia must be coming from somewhere, and I doubt it's price-sensitive consumers.
I'm working on a three node strix halo agentic OS factory designed to be maintained by local agents: https://github.com/projectbluefin/testing-lab
This memory bandwidth combo is amazing for homelabbers. kyuz0's work on these containers has made the investment in this kit so valuable I hope Framework is sending you hardware!
https://projectbluefin.io/server/ is what I'm hoping to ship, designed to just ship setups like this ootb and things like this would be so much harder without kyuz0!
(Note: The 64GB ones are going for $1700-ish empty, the prices on the 128's are outrageous we can just keep making the labs more deterministic over time!)
I do hope that apple opens up RDMA for their TB4 machines... ds4 using TB5 macs works great - but there are a lot of capable tb4 (M2/1) machines out there and afaik there's no hardware limitation preventing RDMA from working (at lower bandwidth, but with the latency gains!) on the older stuff.
Would love to see DeepSeek V4 flash/pro and MiniMax M3 benchmarks but already these are pretty impressive, first strix Halo setup I've seen with some serious performance.
EDIT: Apologies - I think I misunderstood these benchmarks - it seems this is actually very slow when compared to a M4 or M5 chip with a good amount of memory. Looking at the creators video here: https://youtu.be/Cfl3TS7ME5s?t=734 -- it seems the performance of strix halo is much much slower than I get on my M4 MBP - which gets ~400 prefill and ~20 tok/s generation
Thats the problem with these AMD laptop class cores, they have very little IO. They have been saying they will release in a desktop form factor, but then it probably wont have such good memory bandwidth...
The Nvidia boxes have 200Gb ethernet thats much more useful for clustering.
Another note: In my experience, RoCE works much better on CX4+ generation. CX3 is best with Infiniband. I think some firmwares on the CX3 generation, has a messed up config for RoCE. But running Infiniband is not a complex task, is way easier than people think, like 10x easier and faster to setup than Ethernet.
- 2x Framework Desktop AI Mainboards with 128GB of RAM for $3150 each
- 2x 100G Ethernet controllers for ~$500 each
So the Framework board has a single PCI-e 4.0 x4 slot, which amounts to 8GB/s or 64Gbps theoretical so you're not getting 100G. Also, the 100G cards all seem to be PCI-e x16 slots for obvious reasons so you need a riser or an adapter or something to even get them to work.
I don't know how hot a 100GbE copper NIC runs but, from experience, 10GbE NICs have been basically giant heatsinks, basically. So fiber might be advisable and I expect short fiber cables here probably aren't cost-prohibitive given everything else.
As an aside, if you are using Ethernet for clustering and you're clustering 2 devices, in an ideal world you'd be using simplex Ethernet but that's not an option here.
I wonder if the author considered USB 4.0 for clustering? I ask because I know people who have clustered Mac Studios over TB5 and that bandwidth is up to 120Gbps. The version of USB4 on the Ryzen AI 395 seems to be 40Gbps, which isn't that far off 8GB/s over PCI-e 4.0 x4.
But the limiting factor with Strix Halo (and DGX Spark for that matter) is memory bandwidth, both under 300GB/s. The obvious comparison is to the Mac Studio. Unfortunately the largest spec they currently sell is 96GB. It had been as high as 512GB. And 96GB is $6700+ but you're also getting way better performance AFAICT eg [1]. The M3 Ultra has ~900GB/s memory bandwidth.
You can alternatively buy a Macbook Pro with M5 Max and 128GB of RAM (now $8000, was $5500-6000 a few days ago) but that tops out at ~600GB/s, which is still double these mini AI boxes.
Oh and if you don't want to go the way of these Framework motherboards, you can buy a whole 128GB Strix Halo PC for $3k or less.
I think the main point here though is we're only a few years away from running 300B+ (or even 1T+) param models at useful speeds on enthusiast hardware.
[1]: https://www.reddit.com/r/LocalLLaMA/comments/1u5mfaq/you_can...
AI + 100GBE (under load) + tiny box = unreliable and eead very quickly.
And could you not use something like an N5 + iSCSI for storage?
Why do you do all this? To avoid collisions and the loss of effective bandwidth from back-offs.
It only really works with 2 computers because if you add a 3rd, now you need 12 NICs instead of 4 for unidirectional point-to-point connections.