How did you land on that model? Hard to tell if I should be a) going to 3.5, b) going to fewer parameters, c) going to a different quantization/variant.
I didn't consider those other flags either, cool.
Are you having good luck with any particular harnesses or other tooling?
If you want to keep using the same model, these settings worked for me.
llama-server -ngl 99 -c 262144 -fa on --cache-type-k q4_0 --cache-type-v q4_0 --host 0.0.0.0 --sleep-idle-seconds 300 -m Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf
For the harness, I use pi (https://pi.dev/). And sometimes, I use the Roo Code plugin for VS Code. (https://roocode.com/)
I prefer simplicity in my tooling, so I can understand them easier. But you might have better luck with other harnesses.