The models usually run fine on the server targeted backends they’re released for.
Those projects you cited are more niche. They each implement their own ways of doing things.
It’s not the responsibility of model providers to implement and debug every different backend out there before they release their model. They release the model and usually a reference way of running it.
The individual projects that do things differently are responsible for making their projects work properly.
Don’t blame the open weight model teams when unrelated projects have bugs!
Sure, for single use-cases, you could make use of a ~20B model if you fine-tune and have very narrow use-case, but at that point usually there are better solutions than LLMs in the first place. For something general, +32B + Q8 is probably bare-minimum for local models, even the "SOTA" ones available today.