undefined

points

by v3ss0n2 hours ago |

comments

by xrd1 hours ago|

[-]

How do you run it? vllm? llama.cpp?

Can you share some parameters you enable tool calling and agentic usage?

Or, higher level, some philosophies on what approaches you are using for tuning to get better tool calling and/or agentic usage?

I'm having surprisingly good success with unsloth/Qwen3.6-27B-GGUF:Q4_K_M (love unsloth guys) on my RTX3090/24GB using opencode as the orchestrator.

It concocts some misleading paths, but the code often compiles, and I consider that a victory.

You have to watch it like you would watch a 14 year old boy who says he is doing his homework but you hear the sound effects of explosions.

by 59nadir1 hours ago|

prev|

[-]

Counter-point: I built an agent that can only interface with Kakoune, a much less common and more challenging situation for an LLM to find itself in, and Gemma4-A4B 8bit quantized does remarkably better in actually figuring out how to get text in buffers than Qwen3.6-35B-A3B in a similar class as Gemma4 A4B.

Now, is this the usual use case? No, it's a benchmark I created specifically in order to put LLMs in situations where they can't just blast out their bash commands without having to interface with something else and adapt.

by lambda1 hours ago|

prev|

[-]

Gemma 4 31b was working ok for me; but it was consuming tons of memory on SWA checkpoints, I had to turn them way down, and as a 31b dense model is fairly slow on a Strix Halo. I did have a lot of tool calling issues on 26b-a4b, though.

The Qwen models are quite solid though.

by 2ndorderthought1 hours ago|

prev|

[-]

Gemma4 is definitely not used for vibe/agentic coding. Not even worth trying. But its a different weight class.