But for code it's a race to the bottom because it's all text and local works quite well. You can host models on LangSmith and similar and because people use those services to create chat bots the overall use of them for coding is a very tiny fraction of their overall usage. The race to the bottom is further exacerbated by the fact that as GPUs become more powerful you can host more per unit so the cost of text inference will drop precipitately. Right now people are reporting that for some of the self hosted services they are able to do everything for under $5 / month. That price WILL drop because that's how computers work.