You can point Claude Code at a local inference server (e.g. llama.cpp, vLLM) and see which model names it sends each request to. It's not hard to do a MITM against it either. Claude Code does send some requests to Haiku, but not the ones you're making with whatever model you have it set to - these are tool result processing requests, conversation summary / title generation requests, etc - low complexity background stuff.
Now, Anthropic could simply take requests to their Opus model and internally route them to Sonnet on the server side, but then it wouldn't really matter which harness was used or what the client requests anyway, as this would be happening server-side.
Actually curious to hear what others think about why Anthropic is so set on disallowing 3rd party tools on subscriptions.
So they have to move up the stack to higher margin business solutions. Which is why they offer subsidized subscription plans in the first place. It’s a marketing cost. But they want those marketing dollars to drive up the stack not commodity inference use cases.