upvote
Seconded. Gemini used to be trash and I used Claude and Codex a lot but gemini-3-flash-preview punches above it's weight, it's decent and I rarely if ever run into any token limit either.
reply
Thirded, I've been using gemini-3-flash to great effect. Anytime I have something more complicated, I give it to pro & flash to see what happens. Coin flip if flash is nearly equivalent (too many moving vars to be analytical at this point)
reply
What models are you running locally? Just curious.

I am mostly restricted to 7-9B. I still like ancient early llama because its pretty unrestricted without having to use an abliteration.

reply
I experimented with many models on my 16G and 32G Macs. For less memory, qwen3:4b is good, for the 32B Mac, gpt-oss:20b is good. I like the smaller Mistral models like mistral:v0.3 and rnj-1:latest is a pretty good small reasoning model.
reply
I like to ask claude how to prompt smaller models for the given task. With one prompt it was able to make a low quantized model call multiple functions via json.
reply