undefined

points

[-]

If you don't have the source code then it makes no difference. If you have the weights and are running some model via llama.cpp, then you are using whatever API llama.cpp is using, not the API that was used to train the model or that anyone else may be using to serve it.

by Eisenstein14 minutes ago|

prev|

[-]

If the card supports vulkan and the model has gguf weights. llamacpp has excellent vulkan support that is being actively developed and is not that far behind CUDA where speed is concerned.

* https://github.com/ggml-org/llama.cpp/releases

by randomgermanguy4 hours ago|

prev|

[-]

If you found a rare 9000 card with 200+ GB of VRAM, sure