Right now it feels like hammering a house onto a nail instead of the other way around.
LLMs have something that's not entirely unlike the "g factor" in humans - a broad "capability base" that spans domains. The best of the best "coding LLMs" need both good "in-domain training" for coding specifically and a high "capability base". And a lot of where that "base" comes from is: model size and the scale of data and compute used in pre-training.
Reducing the model scale and pruning the training data would result in a model with a lower "base". It would also hurt in-domain performance - because capabilities generalize and transfer, and pruning C code from the training data would "unteach" the model things that also apply to code in PHP.
Thus, the pursuit of "narrow specialist LLMs" is misguided, as a rule.
Unless you have a well defined set bar that, once cleared, makes the task solved, and there is no risk of scope adjustment, no benefit from any future capability improvements above that bar, and enough load to justify the engineering costs of training a purpose-specific model? A "strong generalist" LLM is typically a better bet than a "narrow specialist".
In practice, this is an incredibly rare set of conditions to be met.
There are hardware-based limitations in the size of LLMs you can feasibly train and serve, which imposes a limit in the amount of information you can pack into a single model's weights, and the amount of compute per second you can get out of that model at inference-time.
My company has been working on this specifically because even now most researchers don't seem to really understand that this is just as much an economics and knowledge problem (cf Hayek) as it is "intelligence"
It is much more efficient to strategically delegate specialized tasks, or ones that require a lot of tokens but not a lot of intelligence, to models that can be served more cheap. This is one of the things that Claude Code does very well. It's also the basis for MOE and some similar architectures with a smarter router model serving as a common base between the experts.
...with a fair amount of supervision, while frontier models would be running circles around them using project-specific memory and on-demand training (or whatever we would have by then).
If you're building something groundbreaking and new, the advantage will be slim to none.