upvote
It's pretty simple; organizations are willing to tolerate paying $1500/month/engineer, which seems to be roughly inline with "normal" consumption for most full-time engineers. If that number grows significantly, then I bet companies will start exploring flash models more, as you propose.
reply
They are willing to tolerate it now, which is quite a switch up from the free for all we had a few weeks ago, and if they aren’t able to tie in this new ~$1500p/m cap to demonstrable productivity and revenue increases then that will be kneecapped even faster
reply
There are plenty of expenses in this order of magnitude that are not tied to direct increases in productivity. I think it may become a serious hiring impediment for companies to be really skimpy on these budgets for example.
reply
There was a time when some employees wanted 1000$ per month for rent, imagine that.

It's absolutely insane, 1500 * 12 is north of 17K dollars, I know that in Google outside of few specific cities and roles.

Getting a 17k bump in salary is good enough to switch, if I was being 17k extra I am more than willing to use my local qwen and hand code most if not all stuff.

Companies can pay for code review tools to make life easier but writing code with AI if it's 10-15% pay cut is just too much.

Everyone is happy right now because this money hasn't been a line item in your salary/benefits.

Imagine 10k yearly AI allowance, I will probably just ask to keep that money.

All the work I do if I was judicious I could do just as much with a 20$ spend or on a local model.

Few tasks need Mythos like models, and if your task does you are already doing too much with AI

reply
I mean we saw this with cloud spending and especially with logging and database read write cost across numerous companies.

It’s a clear pattern in service delivery for software for a while now. Hell for many goods and services in general, like Uber rides themselves.

Start cheap, get some vendor lock in, service provider reduces discounts, consumer notices and then reacts to the price by reducing consumption.

reply
> organizations are willing to tolerate paying $1500/month/engineer

One organization, that is a software company

> which seems to be roughly inline with "normal" consumption for most full-time engineers

My peers are using $20/mo plans, only a handful are using more than $100/mo in tokens. We haven’t had any limits imposed yet.

reply
Which organizations?

Uber is not representative of any trend beyond big tech and VC over funded startups.

reply
The easy decision is to just go with the biggest SOTA model you can afford.

But this overlooks the other critical part of getting the most out of these things: the harness. I run an autonomous plan/design/code/build/test pipeline with agents using my own orchestrator. Different models are better at different stages, and I use LLMs to judge the output between them. Not everything needs Opus 4.8.

The harness provides both the scaffolding to get the right things into the model, and the right things out. But it also lets you dictate which model does which work.

It's the pipeline, not the model, that gets you quality at a given token budget.

reply
There is something about using the most advanced tooling possible. Why would you pay for IntelliJ, if Eclipse can do the same thing a bit worse?

You want to master your craft, develop "optimal" systems, understand where things are going by utilizing SOTA.

You can call it FOMO, but you get the point.

reply
Is your argument that $1500 / mo is too much? Why would the engineering team not be more rigorous in their model selection given a constraint?
reply
If you had a business task to complete that was only possible with ai and it cost you >$1500/month of work, how long would you have to delay the task so that it's cheaper long run to buy hardware and do local models?

$1,500/mo * 14 months = $21,000.

If local models are 14mo behind as many in HN say it may be profitable to just wait. Maybe just spend a few hundred dollars of your tokens and buy hardware piece by piece.

reply
Nearly no one is doing anything that is “only possible with AI”. This doesn’t seem like a relevant calculation. People spend on AI as an investment in their current productivity.
reply
There's a lot of opportunity cost to waiting 14 months to build something.
reply
I agree, outside of the AI bubble, there's a lot of wait-and-see happening in the B2B world right now, I'd say we're currently 6-8 months into that 14 months.
reply
It also presupposes that open models will bridge that gap towards opus4.5, which was really when I drank the AI coding koolaid
reply
I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly? Perhaps, if it can measure complexity, even generate a quote?

Small models are fine for small coding tasks but I don't see why big ones can't be broken down most of the time.

reply
Many harnesses do this, I've recently dropped all my big subscriptions for using deepseek. Codewhale (formerly deepseek-tui) will use pro for large tasks and route smaller ones to flash. It's pretty good, but I just use pro and everything as the cost is quite low.

This one does not have routing, but reasonix is insane, absolutely insane for saving money. I've used 1.3billion tokens at the cost of 4$. (99-100% cache hit)

reply
> I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly?

This sounds like something a harness could do (and might already be doing), with work delegated to subagents running on lower-cost models.

reply
Yes, they are all already doing this
reply
deleted
reply
This a thousand times. The bigger models also have a habit of overcomplicating things.
reply
> Don't ask LLMs for big changes

> Review everything and point them in the right direction

Sorry upper management doesn't care. That's an engineering problem that you need to solve.

reply
They were proposing a solution.. To use flash models and use them in a way that best amplifies your work.
reply
He was making a joke.
reply
Indeed I was. But that's lost on people here.
reply
I'm legit annoyed at opus 4.8 at any setting above 4.8.

I believe it can be great for vibe coding, but mundane day work? Hell no, I'd rather work with Haiku. It's too slow, checks too many things, it's annoying as hell.

reply