I'm currently experimenting with (trying to) fine tune Qwen3.5 to make it better at a given language (Nim in this case); but I am quite bad at this, and honestly am unsure if it's even really fully feasible at the scale I have access to. Certainly been fun so far though, and I have a little Asus GX10 box on the way to experiment some more!
Been playing around with fine-tuning models for specific languages as well (Clojure and Rust mostly), but the persistent problem is high quality data sets, mostly I've been generating my own based on my own repositories and chat sessions, what approach are you taking for gathering the data?
My own experience trying many different models is that general intelligence of the model is more important.
If you want it to stick to better practices you have to write skills, provide references (example code it can read), and provide it with harnessing tools (linters, debuggers, etc) so the agent can iterate on its own output.