It's absolutely insane, 1500 * 12 is north of 17K dollars, I know that in Google outside of few specific cities and roles.
Getting a 17k bump in salary is good enough to switch, if I was being 17k extra I am more than willing to use my local qwen and hand code most if not all stuff.
Companies can pay for code review tools to make life easier but writing code with AI if it's 10-15% pay cut is just too much.
Everyone is happy right now because this money hasn't been a line item in your salary/benefits.
Imagine 10k yearly AI allowance, I will probably just ask to keep that money.
All the work I do if I was judicious I could do just as much with a 20$ spend or on a local model.
Few tasks need Mythos like models, and if your task does you are already doing too much with AI
It’s a clear pattern in service delivery for software for a while now. Hell for many goods and services in general, like Uber rides themselves.
Start cheap, get some vendor lock in, service provider reduces discounts, consumer notices and then reacts to the price by reducing consumption.
One organization, that is a software company
> which seems to be roughly inline with "normal" consumption for most full-time engineers
My peers are using $20/mo plans, only a handful are using more than $100/mo in tokens. We haven’t had any limits imposed yet.
Uber is not representative of any trend beyond big tech and VC over funded startups.
But this overlooks the other critical part of getting the most out of these things: the harness. I run an autonomous plan/design/code/build/test pipeline with agents using my own orchestrator. Different models are better at different stages, and I use LLMs to judge the output between them. Not everything needs Opus 4.8.
The harness provides both the scaffolding to get the right things into the model, and the right things out. But it also lets you dictate which model does which work.
It's the pipeline, not the model, that gets you quality at a given token budget.
You want to master your craft, develop "optimal" systems, understand where things are going by utilizing SOTA.
You can call it FOMO, but you get the point.
$1,500/mo * 14 months = $21,000.
If local models are 14mo behind as many in HN say it may be profitable to just wait. Maybe just spend a few hundred dollars of your tokens and buy hardware piece by piece.
Small models are fine for small coding tasks but I don't see why big ones can't be broken down most of the time.
This one does not have routing, but reasonix is insane, absolutely insane for saving money. I've used 1.3billion tokens at the cost of 4$. (99-100% cache hit)
This sounds like something a harness could do (and might already be doing), with work delegated to subagents running on lower-cost models.
> Review everything and point them in the right direction
Sorry upper management doesn't care. That's an engineering problem that you need to solve.
I believe it can be great for vibe coding, but mundane day work? Hell no, I'd rather work with Haiku. It's too slow, checks too many things, it's annoying as hell.