upvote
I had the best success yet earlier today running https://pi.dev with a local gemma4 model on ollama on my m4 Mac with 48GB ram. I think pi is a lot lighter than Claude code.
reply
I didn’t think pi supported local models?
reply
It does! Ollama provides a helper to launch it with the local model too: https://docs.ollama.com/integrations/pi

So you can do:

   ollama launch pi --model gemma4:26b
And it launches and points to the local model in one command. pi seems to do some setting caching too, because after doing the above once I can just do `pi` and it's already setup to the local model.
reply
pi does, it can talk to any OpenAI API
reply
context window for Qwen3.6 models' size increase isn't that bad/large (e.g. you can likely fix max context well within the 48GB), but macbook prompt processing is notoriously slow (At least up through M4. M5 got some speedup but I haven't messed with it).

One thing to keep in mind is that you do not need to fully fit the model in memory to run it. For example, I'm able to get acceptable token generation speed (~55 tok/s) on a 3080 by offloading expert layers. I can't remember the prompt processing speed though, but generally speaking people say prompt processing is compute bound, so benefits more from an actual GPU.

reply
Try running with Open Code. It works quite well.
reply
I had an equally painful experience with Open Code. I don't think the harness is the issue. It's the need for a large context window and slow inference.
reply