I find anything below 50 tps or so entirely unusable...
Regardless its Apples to oranges anyway, inference is quite cheap for open weight models its just that Claude and OpenAI can charge very high margins compared to e.g. DeepSeek or various provider on OpenRouter since open models are a commodity.
Using local hardware is expensive when it's running a complicated software stack that can break in 10,000 different ways.
These eventual local AI servers will just talk some protocol for AI and sit in the corner and nobody will think about them.
I guess they still might need access to various systems, so idk. Eventually I think someone will offer "AI in a box" though, running the latest open model or whatever.
“AI in a box” sounds a heck of a lot like “the box” from the Silicon Valley TV show. Or the Google search appliance. Or name any other on-premise thing that is equally dinosauric.
The real finding of this article is that AI tokens are direct competitors with offshoring. $1,500/month buys you a whole employee in India.
And this is before AI companies inevitably increase pricing after the conclusion of the growth phase.
For customer facing, production software, its worth paying a cloud tax to get the reliability guarantee. For tools that are used by engineers for code development, there is no need for such bulletproof guarantees.
This is unlike customer facing systems where, if your database server goes down, you probably can't just use the other one--the whole system is down.
It’s really a lot more about business focus.
I don’t want to hire someone who understands how an email server works if I can pay Microsoft $10/employee/month for an email account.
Which category of developer tool has on-premise as the more popular option?
Cloud isn’t about “reliability,” it’s about being able to focus on your core business rather than spending all your time maintaining stuff.
You could probably reach the former figure on a prosumer platform but only for very special workloads. If you spend a lot of time on prefill (which is common for agentic workloads) the outlook is even worse since that's a significant constraint for any on-prem AI.
You can ask the same for the median 330k salary in the US for Uber Engineering... and being a bit snarky, attending Uber engineers talks here and there at a few conferences, looks like. they love to (re)invent internal tooling/platforms. That's pretty expensive on its own.
EDIT: I'm not saying that Uber's engineers didn't add value to the company, they absolutely did and handling the scale up they had to handle is not an easy feat. But I do challenge the notion of "what features did they create with that (LLM) spending?" of GP.
People DO.
It's well known that most tech companies are ran incompetently. As you say, it's not the engineers' fault.
But most projects and hiring in these companies exists to juice promotion criteria. And that, depending on perspective, these companies are either massively overstaffed or massively underproductive.
The comparison to AI spending being wasteful holds up pretty well, these are companies that readily piss away billions in pointless spending.
I think it's a general problem, but in my rare conversations with execs nowadays, they seem rather uninterested in improving their decision making there. The actual performance of the organization does not appear to be all that relevant to them.
I don't know; I'm a Ron Popeil "set it and forget it" kind of guy. Make the dumbest, simplest thing that's going to work with some clear path for scaling. Then go do valuable things instead.
But in Uber's case, they tend to reinvent lower level pieces of platform/infra.
The idea of "if you add intelligence you make more money" is contradicted by the fact companies don't just always hire more people. Wy doesn't google just hire everyone?
Yeah, I bet all labs releasing SOTA models are more than happy to remove the main way they make money and let you run it locally, especially if you're a big spender like Uber who seems very willing to throw money into the sea as an experiment.
Anthropic and OpenAI license to the public clouds. Google reportedly licenses to Apple. licensing to Fortune 100 companies running on their own infra is an obvious next step
it is a race to the bottom and I’m not sure the labs win that race. we’ll see!
If the large, well founded IT companies in the world believes the current AI cost is to high, then Anthropic, OpenAI and CoPilot have no actual customer base. AI is then relegated to very profitable niche business, but that can't fund the R&D for the models.
Also, I don't believe you need to spend $1500 a month on a coding agent if you optimize usage at all.
For the employer those employees cost between 2945 - 7736 EUR per month based on https://kalkulatori.lv/lv/algas-kalkulators (income and social taxes).
So on the lower end that's (1500 USD ~ 1300 EUR) close to half the total expenses of such a developer, on the high end here around 15-20%. That's quite significant, depends on whether their productivity also improves (if that's what the orgs care about).
And we’re not even the country with the worst pay out there, but pay the same for tokens, cause regional pricing isn’t a thing!
It probably allowed them to avoid hiring as many people to build a certain amount of software. Even if it didn't increase revenue, it could have lowered human labor costs.
> 128 GB machines that can run local LLMs are a bargain even if priced $5-8k.
Don't forget the energy costs. Searching around, advanced models use an average of 25 Wh/1000Tok.
$1500/month gets you about 150M tokens.
At the aforementioned energy/token, that's 3750kWh.
What are your local office electricity rates/tariffs? (Hint: they are going up because of AI data centers). Even if my price and energy assumptions are wrong above, you probably aren't going to get the rates that the hyperscalers do.
Even at cheap (i.e Texas) retail electricity rates, that many tokens will probably cost you hundreds per month. In most other electricity markets, probably far more.
Unless they are iteratively replacing expensive vendors and optimizing other headcount costs?
I get that if it's offline the security downside of XP doesnt matter, and I assume XP is free, but being free doesnt really seem that valuable compared to alternatives (free linux and virtually free OS if buying wholesale).
Even then it makes more sense to rent the bigger GPU and get your answer faster.
I suspect there’s some mass delusion with respect to actual accomplishments as a result of LLM use. Sure, things are moving faster, but does it matter?
Hard tasks require a lot of guidance and code reviewing, unless you are creating another throw away project where correctness, maintainability and code understanding does not matter.
I have still found the sweet spot for me is using LLMs but I am still in the drivers seat.
Normal people have to produce something of value from that spend. So starting 100 agents and then waking up to something cool but useless just means you spent a few thousand dollars and created nothing of value............
Uber (and quite a few bay area companies and startups) can afford to spend that money. There is no expectation of profit, Uber lost ~62B and growing: https://uberlosses.com/
It's profit margin seems to have stabilized around 10%.
The real economic crime is losing at least $40bn over 10 years scaling a business that ended up having retail profit margins (i.e. low profit margins).
WTF did anyone build with all that spend? Despite all the feel-good anecdotes about how productive folks feel using ai coding tools there's a deafening silence when it comes to actual, demonstrated efficacy. How can we be this far entrenched in these workflows and still not know whether they actually do anything useful?
What would previously be janky internal dashboards or excel sheets are now actually nice to use tools. That said of course the maintenance cost of all that has yet to be discovered, and the ROI is questionable.
OK. I guess that's good, too.
I don't think this would have been possible without having solid engineering culture and processes in place before bringing in ai coding tools.
And I don't want to sugarcoat it, this hasn't been easy, requires continued discipline, and took well over a year to get good at. And we still have to continuously learn, experiment and adapt our training, tooling, and processes.
> We are shipping more features
That's not really the important question; the important question: is it generating revenue.If you increase your spend -> ship more features -> no correlated increase in revenue, that's just burning money.
If a team of 10 spends 1 extra headcount ($180k/year) and ships features with no corresponding growth in revenue, what does that mean?
There was probably a reason it was on the backlog (because it didn't really have value).
Yes! :)
> There was probably a reason it was on the backlog (because it didn't really have value).
There are definitely things in the backlog with low value. We don't work those items, even if we could now. The additional bandwidth we have now goes to valuable features that drive revenue and retention metrics. The reason they were on the backlog were because we just didn't have the bandwidth to execute on them well and they were just somewhat less valuable than the critical path items on the roadmap.
Software engineer quality of life.
There can be an increase in productivity without a corresponding increase in total output. The gains could be captured by software engineers doing a days work in an hour then fucking off in a variety of ways.
Until companies start hiring 5x less engineers than they did before and well.. we are clearly moving towards that direction
as for building actually complex software, the art of that is not in simply chaining together such scripts. Its the art of using architecture and testing to shape uncertainty, and developing requirements (and extrapolating sensibly from incomplete requirements). I don't think llms are great at this, but they arent terrible either. A lot of the more active users in the space are doing stuff where theyve realised they need more detailed specs, which like, yeah, we knew this already - better defined problems lead to better software.
Coding faster doesn't really solve that.
Uber makes more money if people buy more rides, order more food, have some breakthrough in autonomous driving. They can save money if they can optimize some ops or spend somewhere. Is there any evidence that with the spend on AI that they achieved any of this? If they did, I'm sure we'd hear about it in some engineering blog.
Uber engineers do not define their revenue stream; the product leadership team does.
$1500/mo of AI spend by engineers does not equate to revenue. They need to figure out revenue first before zeroing in on AI spend.
Claude has allowed me to do refactors that would have taken weeks to instead take a couple of days. It has, objectively, increased the velocity of the engineering component of greenfield features by 40% in my org. You can put a number value on that and decide if it gives you favorable ROI.
Software engineers like to talk as if business and finance are as easy as pushing code out and refactoring. It's not and never has been.
You’ve gotten a result, but without the work that made you valuable, while deskilling yourself.
It’s a lose/lose situation for…I would say anyone employed as an engineer or programmer. I’m not taking responsible for AI output, the same way I won’t try to fix auto-generated code: because you just regenerate it.
The only person that wins here is the person who can pay you less because they don’t need you, they just need another “types computer guy”.
I'm pretty pessimistic on AI and don't have access to good agentic workflows, but refactors are exactly the thing where it seems to me like agents could be really strong - once I've refactored something architecturally, I might have hundreds of instances of a thing that needs to be updated in a predictable way, but is complicated enough that it's going to be faster for me to manually update hundreds of instances rather than writing a generalizable find/replace tool.
Absolutely false. Refactors (in my case) can be as simple as dropping old packages for newer packages with slightly different semantics. It can be moving legacy pages from jQuery to Vue.
> You’ve gotten a result, but without the work that made you valuable, while deskilling yourself.
I've 25 years coding, trust me, I don't lose anything by not finding out on my own that the semantics of a jQuery promise changed between major versions.
> The only person that wins here is the person who can pay you less because they don’t need you, they just need another “types computer guy”.
You have no idea of what you're talking about. There are entire classes of K8s networking issues that would have taken me a day to debug which Claude solved in minutes just because it can run 20 diagnostics commands in two minutes and deal with technical minutae that is time-consuming but ultimately irrelevant to my business goals.
I effectively get to operate at the rate of a small team of engineers - I know that because I've managed small teams of engineers in the past.
I think this is the part I struggle with. The code I write makes me money or is a way of teaching me something, both of which are reasons that I would write the code regardless.
I don’t think I have any projects in mind that I’d be willing to spend half of a car on that I also wouldn’t have written myself.
Obviously just a personal take though. I’m glad you get the usage you want out of it.
I use Gemini/ChatGPT/Claude to do that work and it unblocked the enjoyable parts of the project while taking care of the tedium.
I also find LLMs help me learn faster because they can often take a paper and turn it into working code, which I find to be a very slow process.