upvote
+1 using llama.cpp Vulkan releases with the Qwen models - runs much better than the ROCm releases.

I'll have to give the preserve_thinking a shot.

reply