upvote
Can confirm, my experience in “loop engineering” was “this is neat” for 45 minutes until a daily ration of tokens was evaporated. The quadratic cost trap is prohibitive to experimentation.

As a localLLM evangelist, I am hopeful this will bring more attention to the joys of rolling your own sovereign AI.

reply
Yeah, i'm hoping that gets smoother. I've been experimenting with omlx and opencode on my m5x64gb and keep running into issues w/ Qwen3.6-35B-A3B-MLX-8bit exceeding it's memory limit at the most inopportune times. Playing with 12B gemma4 (8bit) more today.

Maybe I should be aiming for something targeting 48gb of memory?

reply
It depends what your goals are and what you are using it for. This space is fluid and my answer last week would be different than my answer today! That said there’s no substitute for hard work, here are some resource to get you up to up to speed:

https://carteakey.dev/blog/local-inference/local-llm-optimiz...

https://botmonster.com/ai/self-hosted-ai-agent-frameworks-20...

Personally I find myself swapping models depending if I am engaged in “trad-development” vs building agentic probes or apps involving imagery. Tailscale the LLM to your deployments and ta-da!

reply