upvote
For coding, Qwen3.6-27B with MTP should fit in 32GB with almost full context length for Unsloth's 5-bit quantization. That's my preferred choice for a local coding agent on similar hardware: the quality delta compared to a MoE model is IMO worth the extra wait. (And I haven't found a model with 70B-120B parameters that works better for coding.) For general chat, maybe gpt-oss-120b? It should have more general knowledge than a 30B-class model; I've used it to suggest itineraries for trips and to review the completeness of small requests for proposals.

I don't have recommendations for images because I haven't played with those.

reply
these days, even completely mainstream distros (Fedora here) include ollama, which leverages a wide range of hardware and range of models. (it's generally useful to install a more recent ollama, though.) there are free coding harnesses too.
reply
ollama is just a wrapper around llama.cpp, and a pretty janky one at that. You're much better off using it directly.
reply