undefined

points

[-]

Try looking into Ryzen AI Max 395. AMD made a CPU/GPU soc with unified memory specifically for ai inference. Can buy mini PCs with up to 128gb ram.

by krzyk7 hours ago|

parent|

[-]

Isn't CUDA/nvidia the go to solution for most local models, with the rest being second class citizents?

by gcr6 hours ago|

parent|

[-]

Depends. ROCm is pretty well-supported for example.

Non-NVIDIA backends tend to get less support and new features land slower, or features that are expected to improve performance wind up hurting it instead. That sort of thing.

For basic “token in/token out” workloads without fine tuning, it’s probably fine ??

by simple108 hours ago|

parent|

prev|

[-]

The Ryzen AI Max 395 128gb is super cool, but not fast for inference. Order of magnitude slower than dedicated GPU but at half the cost. You can run larger models on it but it's slow. Great for local async work. Not great for daily chat or code agent driver.

by throwa3562628 hours ago|

parent|

[-]

The latest NPUs are pretty fast, I think what is missing is more optimised software support.

by plagiarist7 hours ago|

parent|

[-]

The vRAM bandwidth is at least as much a problem as compute on these ones, there is a lot of data to shuffle around

by varispeed11 hours ago|

prev|

[-]

Probably a comparable non-Mac setup will be Threadripper, but it will become much more expensive. My view is that actually Apple products are the cheapest on the market when it comes to performance.