Seems like they're making some effort in that direction at least. If you have specific concerns, maybe try hitting up Anush Elangovan on Twitter?
Hmmm
https://rocm.docs.amd.com/en/latest/compatibility/compatibil...
I suspect, given AMDs relative openness vs. nvidia, even consumer-level stuff released today will end up with a longer useful life than current nvidia stuff.
I could be wrong, of course. I've taken the gamble...the last nvidia GPU I bought was a 3070 several years ago. Everything recent has been AMD. It's half the price for nearly competitive performance and VRAM. If that bet turns out wrong, I'll just upgrade a little sooner and still probably end up ahead. But, I think/hope openness will win.
Also, nvidia graphics drivers on Linux are a pain in the ass that I didn't want to keep dealing with. I decided it wasn't worth the hassle, even if they're better on some metrics. I've been able to run everything I've tried on an AMD Strix Halo and an old Radeon Pro V620 (not great, but cheap, compared to other 32GB GPUs and still supported by current ROCm).
It is Nvidia that has the track record of closed drivers and insisting on doing all software dev without community improvements to expected results.
The defacto GPU compute platform? With the best featureset?
Also pretty hard to beat a Strix Halo right now in TPS for the money and power consumption.
Even that aside there exist plenty like me that demand high freedom and transparency and will pay double for it if we have to.
The market doesn't care about any of that. The consumer market doesn't care, and the commercial market definitely does not. The consumer market wants the most Fortnite frames per second per dollar. The commercial market cares about how much compute they can do per watt, per slot.
> there exist plenty like me that demand high freedom and transparency and will pay double for it if we have to.
The four percent share of the datacenter market and five percent of the desktop GPU market say (very strongly) otherwise.
I have a 100% AMD system in front of me so I'm hardly an NVIDIA fanboy, but you thinking you represent the market is pretty nuts.
I think local power efficient LLMs are going to make those datacenter numbers less relevant in the long run.
LLM’s run great on it, it’s happily running gemma4 31b at the moment and I’m quite impressed. For the amount of VRAM you get it’s hard to beat, apart from the Intel cards maybe. But the driver support doesn’t seem to be that great there either.
Had some trouble with running comfyui, but it’s not my main use case, so I did not spent a lot of time figuring that out yet
May I ask, what kind of tok/s you are getting with the r9700? I assume you got it fully in vram?
$uname -r
6.8.0-107-generic
$ollama --version
ollama version is 0.20.2
$ollama run "gemma4:31b" --verbose "write fizzbuzz in python."
[...]
total duration: 45.141599637s
load duration: 143.633498ms
prompt eval count: 21 token(s)
prompt eval duration: 48.047609ms
prompt eval rate: 437.07 tokens/s
eval count: 1057 token(s)
eval duration: 44.676612241s
eval rate: 23.66 tokens/sThe model that is currently loaded full time for all workloads on this machine is Unsloth's Q3_K_M quant of Qwen 3.5 122b, which has 10b active parameters. With almost no context usage it will generate 59 tok/sec. At 10,000 input tokens it will prefill at about 1500 tok/sec and generate at 51 tok/sec. At 110,000 input tokens it will prefill at about 950 tok/sec and generate at 30 tok/sec.
Smaller MoE models with 3b active will push 70 tok/sec at 10,000 context. Dense models like Qwen 3.5 27b and Devstral Small 2 at 24b will only generate at around 13 - 15 tok/sec with 10,000 context.
This is all on llama.cpp with the Vulkan backend. I didn't get to far in testing / using anything that requires ROCm because there is an outstanding ROCm bug where the GPU clock stays at 100% (and drawing like 60 watts) even when the model is not processing anything. The issue is now closed but multiple commenters indicate it is still a problem. Using the Vulkan backend my per-card idle draw is between 1 and 2 watts with the display outputs shut down and no kernel frame buffer.
Edit: I misread the "2x r9700" as "2 rx9700" which differs from the topic of this comment (about RNDA4 consumer SKUs). I'll keep my comment up, but anyone looking to get Radeon PRO cards can (should?) disregard.
If I were to do it again, I’d probably just get a dgx spark. I don’t think it’s been worth the hassle.
But do beware, it’s weird hardware and not really Blackwell. We are only just starting to squeeze full performance out of SM12.1 lately!