undefined

points

by cpburns20097 hours ago|

[-]

In my experience using llama.cpp (which ollama uses internally) on a Strix Halo, whether ROCm or Vulkan performs better really depends on the model and it's usually within 10%. I have access to an RX 7900 XT I should compare to though.

by metalliqaz7 hours ago|

parent|

[-]

Perhaps I should just google it, but I'm under the impression that ollama uses llama.cpp internally, not the other way around.

Thanks for that data point I should experiment with ROCm

by cpburns20096 hours ago|

parent|

[-]

I meant ollama uses llama.cpp internally. Sorry for the confusion.

by naasking5 hours ago|

parent|

prev|

[-]

From what I understand, ROCm is a lot buggier and has some performance regressions on a lot of GPUs in the 7.x series. Vulkan performance for LLMs is apparently not far behind ROCm and is far more stable and predictable at this time.

by 0x4575 hours ago|

prev|

[-]

For me Vulkan performs better on integrated cards, but ROCm (MIGraphX) on 7900 XTX.

by hrmtst938376 hours ago|

prev|

[-]

Wrong layer. Vulkan is a graphics and compute API, while Lemonade is an LLM server, so comparing them makes about as much sense as comparing sockets to nginx. If your goal is to run local models without writing half the stack yourself, compare Lemonade to Ollama or vLLM.

by metalliqaz5 hours ago|

parent|

[-]

I was talking about ROCm vs Vulkan. On AMD GPUs, Vulkan has been commonly recognized as the faster API for some time. Both have been slower than CUDA due to most of the hosting projects focusing entirely on Nvidia. Parent post seemed to indicate that newer ROCm releases are better.

by naasking5 hours ago|

parent|

[-]

Yes, Vulkan is currently faster due to some ROCm regressions: https://github.com/ROCm/ROCm/issues/5805#issuecomment-414161...

ROCm should be faster in the end, if they ever fix those issues.