undefined

points

[-]

I had the best success yet earlier today running https://pi.dev with a local gemma4 model on ollama on my m4 Mac with 48GB ram. I think pi is a lot lighter than Claude code.

by qudat14 hours ago|

parent|

[-]

I didn’t think pi supported local models?

by domh51 minutes ago|

parent|

[-]

It does! Ollama provides a helper to launch it with the local model too: https://docs.ollama.com/integrations/pi

So you can do:

   ollama launch pi --model gemma4:26b

And it launches and points to the local model in one command. pi seems to do some setting caching too, because after doing the above once I can just do `pi` and it's already setup to the local model.

by segmondy14 hours ago|

parent|

prev|

[-]

pi does, it can talk to any OpenAI API

by mswphd15 hours ago|

prev|

[-]

context window for Qwen3.6 models' size increase isn't that bad/large (e.g. you can likely fix max context well within the 48GB), but macbook prompt processing is notoriously slow (At least up through M4. M5 got some speedup but I haven't messed with it).

One thing to keep in mind is that you do not need to fully fit the model in memory to run it. For example, I'm able to get acceptable token generation speed (~55 tok/s) on a 3080 by offloading expert layers. I can't remember the prompt processing speed though, but generally speaking people say prompt processing is compute bound, so benefits more from an actual GPU.

by docheinestages5 hours ago|

parent|

[-]

[dead]

by swalsh16 hours ago|

prev|

[-]

Try running with Open Code. It works quite well.

by docheinestages5 hours ago|

parent|

[-]

I had an equally painful experience with Open Code. I don't think the harness is the issue. It's the need for a large context window and slow inference.