upvote
You can create an MCP to call out to Ollama. Then have Claude farm work out to local models where the raw power isn't required. You can then have Claude review the work from the model.

Its not 100% offline, but there is a dramatic drop in token usage. As long as you can put up with the speed.

reply
https://docs.ollama.com/integrations/claude-code

You can use models like qwen3.5 running on local hardware in ollama and redirect Claude to use the local ollama API endpoint instead of Anthropic’s servers.

reply
I believe one can use the CC as the primary model driving local agents that use local models
reply
You can connect it to any anthropic compatible endpoint(kimi allows this) but it's a weird choice, given that Open code, pi.dev and others are open source.
reply