points
This approach requires that you trust the llama.cpp codebase, essentially. It might be reasonable not to.
I suppose in principle there is the risk of a prompt exploit corrupting the inference server.