undefined

points

[-]

Ollama has very substandard support for mmap at present, which hurts inference with larger models. There are some recent pull requests in flight that should help address this to at least some extent https://github.com/ollama/ollama/pull/14525 https://github.com/ollama/ollama/pull/14134 https://github.com/ollama/ollama/pull/14864 but progress seems to be stalling. Their support for recent Qwen models seems to also have some bespoke incompatibilities with llama.cpp, which doesn't help matters; it's difficult to test the same model with both.

by rubiquity4 hours ago|

prev|

[-]

llama.cpp and llama-swap do this better than Ollama and with far more control.

by circularfoyers3 hours ago|

parent|

[-]

Don't even need to use llama-swap anymore now that llama-server supports the same functionality.