Been playing around with fine-tuning models for specific languages as well (Clojure and Rust mostly), but the persistent problem is high quality data sets, mostly I've been generating my own based on my own repositories and chat sessions, what approach are you taking for gathering the data?
reply