You need the LLM to be able to respond with tool use requests, and then your local harness to process them and respond to it. You can read how tool calling works with eg Claude API to get the idea: https://platform.claude.com/docs/en/agents-and-tools/tool-us...
Under the hood something like Claude Code is calling the API with tools registered, and then when it gets a tool use request it runs that locally, and then responds to the API with the result. That’s the loop that enables coding.
Integrating with an IDE specifically is really just a UI feature, rather than the core functionality.
I'm not sure if I can make the 35B-A3B work with my 32GB machine
You won't have much RAM left over though :-/.
At Q4, ~20 GiB
So far gemma 4 seems excellent at role playing, document analysis, and decent at making agentic decisions.