undefined

points

[-]

You can compile it from source, all you need to do is clone the repository and do a `cmake -B build -DGGML_VULKAN=1` (add other backends if you want) followed by a `cmake --build build --config Release` and then you get all the llama tools in the `build/bin` (including `llama-server` which provides a web-based interface). There is a `docs/build.md` that has more detailed info (especially if you need another backend, though at least on my RX 7900 XTX i see no difference in terms of performance between Vulkan and ROCm and the former is much more stable and compatible -- i tried ROCm for a bit thinking it'd be much faster but only ended up being much more annoying as some models would OOM on it while they worked on Vulkan -- if you or NVIDIA hardware all this may sound quaint though :-P).

by 9999000009999 hours ago|

parent|

[-]

Cool, I assume this is how adults use llms.

I’m on a nvidia gpu , but I want to be able to combine vram with system memory.

by rexreed13 hours ago|

prev|

[-]

Why are you looking to move off Ollama? Just curious because I'm using Ollama and the cloud models (Kimi 2.5 and Minimax 2.7) which I'm having lots of good success with.

by 99990000099913 hours ago|

parent|

[-]

Ollama co mingles online and local models which defeats the purpose for me

by rexreed10 hours ago|

parent|

[-]

You can disable all cloud models in your Ollama settings if you just want all local. For cloud you don't have to use the cloud models unless you explicitly request.

by stratos1239 hours ago|

prev|

[-]

Why not just download the binaries from github releases?