upvote
Ollama is good enough to dabble with, and getting a model is as easy as ollama pull <model name> vs figuring it out by yourself on hugging face and trying to make sense on all the goofy letters and numbers between the forty different names of models, and not needing a hugging face account to download.

So you start there and eventually you want to get off the happy path, then you need to learn more about the server and it's all so much more complicated than just using ollama. You just want to try models, not learn the intricacies of hosting LLMs.

reply
to be fair, llama.cpp has gotten much easier to use lately with llama-server -hf <model name>. That said, the need to compile it yourself is still a pretty big barrier for most people.
reply
[delayed]
reply
I started with ollama and now I'm using llama.cpp/llama-server's Router Mode that allows you to manage multiple models through a single server instance.

One thing I haven't figured out: Subjectively, it feels like ollama's model loading was nearly instant, while I feel like I'm always waiting for llama.cpp to load models, but that doesn't make sense because it's ultimately the same software. Maybe I should try ollama again to convince myself that I'm not crazy and that ollama's model loading wasn't actually instant.

reply
You don't need to compile it yourself though? Unless you want CUDA support on Linux I guess, dunno why you'd need such a silly thing though:

https://github.com/ggml-org/llama.cpp/releases

reply
Ollama got some first-mover advantage at the time when actually building and git pulling llama.cpp was a bit of a moat. The devs' docker past probably made them overestimate how much they could lay claim to mindshare. However, no one really could have known how quickly things would evolve... Now I mostly recommend LM-studio to people.

What does unsloth-studio bring on top?

reply
LM Studio has been around longer. I’ve used it since three years ago. I’d also agree it is generally a better beginner choice then and now.

Unsloth Studio is more featureful (well integrated tool calling, web search, and code execution being headline features), and comes from the people consistently making some of the best GGUF quants of all popular models. It also is well documented, easy to setup, and also has good fine-tuning support.

reply
LM Studio isn't free/libre/open source software, which misses the point of using open weights and open source LLMs in the first place.
reply
Disagree, there are a lot of reasons to use open source local LLMs that aren't related to free/libre/oss principles. Privacy being a major one.
reply
If you care about privacy making sure the closed source software does not call home is a concern...
reply
I run Little Snitch[1] on my Mac, and I haven't seen LM Studio make any calls that I feel like it shouldn't be making.

Point it to a local models folder, and you can firewall the entire app if you feel like it.

Digressing, but the issue with open source software is that most OSS software don't understand UX. UX requires a strong hand and opinionated decision making on whether or not something belongs front-and-center and it's something that developers struggle with. The only counterexample I can think of is Blender and it's a rare exception and sadly not the norm.

LM Studio manages the backend well, hides its complexities and serves as a good front-end for downloading/managing models. Since I download the models to a shared common location, If I don't want to deal with the LM Studio UX, I then easily use the downloaded models with direct llama.cpp, llama-swap and mlx_lm calls.

[1]: https://obdev.at

reply
What I really don't get is why more people don't talk about LMStudio, I switched to it months ago and it seems like a straight upgrade.
reply
Isn’t LMStudio closed source?
reply
How does LMStudio compare to Unsloth Studio?
reply
Advertising, mostly.

Ollama's org had people flood various LLM/programming related Reddits and Discords and elsewhere, claiming it was an 'easy frontend for llama.cpp', and tricked people.

Only way to win is to uninstall it and switch to llama.cpp.

reply
Ollama user with the opposite question -- why not? What am I missing out on? I'm using it as the backend for playing with other frontend stuff and it seems to work just fine.

And as someone running at 16gb card, I'm especially curious as to if I'm missing out on better performance?

reply
> Ollama user with the opposite question -- why not? What am I missing out on? I'm using it as the backend for playing with other frontend stuff and it seems to work just fine.

Used to be an Ollama user. Everything that you cite as benefits for Ollama is what I was drawn to in the first place as well, then moved on to using llama.cpp directly. Apart from being extremely unethical, The issue is that they try to abstract away a bit too much, especially when LLM model quality is highly affected by a bunch of parameters. Hell you can't tell what quant you're downloading. Can you tell at a glance what size of model's downloaded? Can you tell if it's optimized for your arch? Or what Quant?

`ollama pull gemma4`

(Yes, I know you can add parameters etc. but the point stands because this is sold as noob-friendly. If you are going to be adding cli params to tweak this, then just do the same with llama.cpp?)

That became a big issue when Deep Seek R1 came out because everyone and their mother was making TikToks saying that you can run the full fat model without explaining that it was a distill, which Ollama had abstracted away. Running `ollama run deepseek-r1` means nothing when the quality ranges from useless to super good.

> And as someone running at 16gb card, I'm especially curious as to if I'm missing out on better performance?

I'd go so far as to say, I can *GUARANTEE* you're missing out on performance if you are using Ollama, no matter the size of your GPU VRAM. You can get significant improvement if you just run underlying llama.cpp.

Secondly, it's chock full of dark patterns (like the ones above) and anti-open source behavior. For some examples:

1. It mangles GGUF files so other apps can't use them, and you can't access them either without a bunch of work on your end (had to script a way to unmangle these long sha-hashed file names) 2. Ollama conveniently fails contribute improvements back to the original codebase (they don't have to technically thanks to MIT), but they didn't bother assisting llama.cpp in developing multimodal capabilities and features such as iSWA. 3. Any innovations to the do is just piggybacking off of llama.cpp that they try to pass off as their own without contributing back to upstream. When new models come out they post "WIP" publicly while twiddling their thumbs waiting for llama.cpp to do the actual work.

It operates in this weird "middle layer" where it is kind of user friendly but it’s not as user friendly as LM Studio.

After all this, I just couldn't continue using it. If the benefits it provides you are good, then by all means continue.

IMO just finding the most optimal parameters for a models and aliasing them in your cli would be a much better experience ngl, especially now that we have llama-server, a nice webui and hot reloading built into llama.cpp

reply
Ollama has had bad defaults forever (stuck on a default CTX of 2048 for like 2 years) and they typically are late to support the latest models vs llamacpp. Absolutely no reason to use it in 2026.
reply
For me it's just the server. I use openwebui as interface. I don't want it all running on the same machine.
reply