But there are EU only providers for GLM5.2. For example tensorx. Depending on your definition of "secure" it may be acceptable.
I have not tried it but I will take your word on it. I don't think Qwen3.6 cuts it for large scale coding work. Reading issues, reading code sure, but biting into large issues no, it goes off the track consistently.
Depending on budget it may also be affordable to spin up servers to run it on demand.
For real work anything below 60 tokens per second is essentially unusable. That's not taking into account the prompt filling, Llama 3.1. 70b on DGX spark runs at about 800 tps running at that speed prompt filling a 512k context takes like 11 minutes.