Now, if you ask this model to have a conversation with you, it's gonna fail and be incoherent. But boy, does it sure reason through math problems well.
Edit: seems fast! I'll try it out some more, thanks again.
This Q5_K_M quant should be near lossless and fit with full 256K context in about 100GB of RAM: https://huggingface.co/AesSedai/Qwen3.5-122B-A10B-GGUF
Edit: specifically Qwen 3.6 27B beats that on coding and agentic workflows.
The Q8_K_XL MTP model from Unsloth: https://huggingface.co/unsloth/Qwen3.6-35B-A3B-MTP-GGUF
I serve the model with ollama and am thinking about replacing ollama but haven't looked into it.
I have openwebui for chat if I want that too, but don't really use it.