And if you somehow managed to open up a big enough VRAM playground, the open weights models are not quite as good at wrangling such large context windows (even opus is hardly capable) without basically getting confused about what they were doing before they finish parsing it.
I'd rate their coding agent harness as slightly to significantly less capable than claude code, but it also plays better with alternate models.
I'm with you on this. I've tried Gemma and Claude code and it's not good. Forgets it can use bash!
However, Gemma running locally with Pi as the harness is a beast.