upvote
Mistral have small variants (3B, 8B, 14B, etc.), as do others like IBM Granite and Qwen. Then there are finetunes based on these models, depending on your workflow/requirements.
reply
True, but anything remotely useful is 300B and above
reply
I am actually doing now a good part of dev with Qwen3-Coder-Next on an M1 64GB with Qwen Code CLI (a fork of Gemini CLI). I very much like

  a) to have an idea how much tokens I use and 
  b) be independent of VC financed token machines and 
  c) I can use it on a plane/train
Also I never have to wait in a queue, nor will I be told to wait for a few hours. And I get many answers in a second.

I don't do full vibe coding with a dozen agents though. I read all the code it produces and guide it where necessary.

Last not least, at some point the VC funded party will be over and when this happens one better knows how to be highly efficient in AI token use.

reply