If you want to keep using the same model, these settings worked for me.
llama-server -ngl 99 -c 262144 -fa on --cache-type-k q4_0 --cache-type-v q4_0 --host 0.0.0.0 --sleep-idle-seconds 300 -m Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf
For the harness, I use pi (https://pi.dev/). And sometimes, I use the Roo Code plugin for VS Code. (https://roocode.com/)
I prefer simplicity in my tooling, so I can understand them easier. But you might have better luck with other harnesses.