Its not 100% offline, but there is a dramatic drop in token usage. As long as you can put up with the speed.
You can use models like qwen3.5 running on local hardware in ollama and redirect Claude to use the local ollama API endpoint instead of Anthropic’s servers.