Remember that utilization of these huge racks will not be 24h/7, and these are usually not GPU intensive shops that would train models on the spare compute. With prices of 100-200k USD and north with ~2 years lifetime, that would be hard to justify financially.
Self hosting could easily amount to ~1000 USD a month amortized across many developers. In rush hours - there will be hard rate limits.
Would that 1500-1000=500$ monthly USD justify the 10% decrease in "AI Productivity" ? I guess not. In most cases.
For everyone that asks me around, I'd say that in short term, unless there's a really good reason to self host these coding assistant models, then the big 2/3 coding assistants providers are the better choice.
No one got fired from licensing claude code.
You tried that on a personal machine for yourself once. It's completely different calculation when serving a model to 3000 employees with ever evolving hardware and software requirements. You'll need dedicated hardware in data centers and experts to run them. A company will need to figure out how to manage acquisition, assets and expenses plus 1000 other things, in addition to its actual business. Guess who has figured out all of that already? AWS/Azure/OpenAI etc.
It costs money to maintain the hardware and hire experts to manage the services. For something as common as LLM models, there is absolutely no reason a company serves models on their own hardware unless they are maniac about sending bytes to AWS.