I was also hoping to put out another post on my homelab setup, it has some neat stuff, but I haven't had a chance to finish it.
The biggest issue I've noticed is that the chat templates for open models are really hit or miss. The default Qwen3.6 chat template mostly works these days, but depending on your workload it may cause major issues. There are plenty of "fixed" chat templates on hugging face, but people report mixed success. It really seems to depend a lot on what the tool you're using expects.
I have 27b, 35B-A3B and a cpu backed gpt-oss configured and use them in parallel, checking if one is getting ratholed and adding context or manual fixes.
I had various other systems setup and commercial models but really don’t use them.
It may be too interactive for some people, but it is a good mix of fail fast and often the places qwen3.6 was failing was eventually problems with the frontier models.
And this is with the unsloth defaults and hardened llama.cpp podman containers.
I do sometimes load other models or honestly just feed things into google’s free agent. But that is rare and to be honest manually fixing is typically faster and less error prone