undefined

upvote

points

by f311a1 days ago |

upvote

by _jab23 hours ago|

[-]

It's pretty simple; organizations are willing to tolerate paying $1500/month/engineer, which seems to be roughly inline with "normal" consumption for most full-time engineers. If that number grows significantly, then I bet companies will start exploring flash models more, as you propose.

reply

upvote

by lavezzi23 hours ago|

[-]

They are willing to tolerate it now, which is quite a switch up from the free for all we had a few weeks ago, and if they aren’t able to tie in this new ~$1500p/m cap to demonstrable productivity and revenue increases then that will be kneecapped even faster

reply

upvote

by phreeza12 hours ago|

[-]

There are plenty of expenses in this order of magnitude that are not tied to direct increases in productivity. I think it may become a serious hiring impediment for companies to be really skimpy on these budgets for example.

reply

upvote

by minraws2 hours ago|

[-]

There was a time when some employees wanted 1000$ per month for rent, imagine that.

It's absolutely insane, 1500 * 12 is north of 17K dollars, I know that in Google outside of few specific cities and roles.

Getting a 17k bump in salary is good enough to switch, if I was being 17k extra I am more than willing to use my local qwen and hand code most if not all stuff.

Companies can pay for code review tools to make life easier but writing code with AI if it's 10-15% pay cut is just too much.

Everyone is happy right now because this money hasn't been a line item in your salary/benefits.

Imagine 10k yearly AI allowance, I will probably just ask to keep that money.

All the work I do if I was judicious I could do just as much with a 20$ spend or on a local model.

Few tasks need Mythos like models, and if your task does you are already doing too much with AI

reply

upvote

by aiisjustanif1 hours ago|

[-]

I mean we saw this with cloud spending and especially with logging and database read write cost across numerous companies.

It’s a clear pattern in service delivery for software for a while now. Hell for many goods and services in general, like Uber rides themselves.

Start cheap, get some vendor lock in, service provider reduces discounts, consumer notices and then reacts to the price by reducing consumption.

reply

upvote

by rudedogg22 hours ago|

[-]

> organizations are willing to tolerate paying $1500/month/engineer

One organization, that is a software company

> which seems to be roughly inline with "normal" consumption for most full-time engineers

My peers are using $20/mo plans, only a handful are using more than $100/mo in tokens. We haven’t had any limits imposed yet.

reply

upvote

by epolanski20 hours ago|

[-]

Which organizations?

Uber is not representative of any trend beyond big tech and VC over funded startups.

reply

upvote

by mrothroc23 hours ago|

[-]

The easy decision is to just go with the biggest SOTA model you can afford.

But this overlooks the other critical part of getting the most out of these things: the harness. I run an autonomous plan/design/code/build/test pipeline with agents using my own orchestrator. Different models are better at different stages, and I use LLMs to judge the output between them. Not everything needs Opus 4.8.

The harness provides both the scaffolding to get the right things into the model, and the right things out. But it also lets you dictate which model does which work.

It's the pipeline, not the model, that gets you quality at a given token budget.

reply

upvote

by chaoz_4 hours ago|

[-]

There is something about using the most advanced tooling possible. Why would you pay for IntelliJ, if Eclipse can do the same thing a bit worse?

You want to master your craft, develop "optimal" systems, understand where things are going by utilizing SOTA.

You can call it FOMO, but you get the point.

reply

upvote

by jmtulloss20 hours ago|

[-]

Is your argument that $1500 / mo is too much? Why would the engineering team not be more rigorous in their model selection given a constraint?

reply

upvote

by gravypod20 hours ago|

[-]

If you had a business task to complete that was only possible with ai and it cost you >$1500/month of work, how long would you have to delay the task so that it's cheaper long run to buy hardware and do local models?

$1,500/mo * 14 months = $21,000.

If local models are 14mo behind as many in HN say it may be profitable to just wait. Maybe just spend a few hundred dollars of your tokens and buy hardware piece by piece.

reply

upvote

by therealdrag013 hours ago|

[-]

Nearly no one is doing anything that is “only possible with AI”. This doesn’t seem like a relevant calculation. People spend on AI as an investment in their current productivity.

reply

upvote

by pchristensen18 hours ago|

[-]

There's a lot of opportunity cost to waiting 14 months to build something.

reply

upvote

by garrickvanburen17 hours ago|

[-]

I agree, outside of the AI bubble, there's a lot of wait-and-see happening in the B2B world right now, I'd say we're currently 6-8 months into that 14 months.

reply

upvote

by edmundsauto16 hours ago|

[-]

It also presupposes that open models will bridge that gap towards opus4.5, which was really when I drank the AI coding koolaid

reply

upvote

by econ23 hours ago|

[-]

I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly? Perhaps, if it can measure complexity, even generate a quote?

Small models are fine for small coding tasks but I don't see why big ones can't be broken down most of the time.

reply

upvote

by AgentMasterRace23 hours ago|

[-]

Many harnesses do this, I've recently dropped all my big subscriptions for using deepseek. Codewhale (formerly deepseek-tui) will use pro for large tasks and route smaller ones to flash. It's pretty good, but I just use pro and everything as the cost is quite low.

This one does not have routing, but reasonix is insane, absolutely insane for saving money. I've used 1.3billion tokens at the cost of 4$. (99-100% cache hit)

reply

upvote

by ValentineC23 hours ago|

[-]

> I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly?

This sounds like something a harness could do (and might already be doing), with work delegated to subagents running on lower-cost models.

reply

upvote

by jorl1721 hours ago|

[-]

Yes, they are all already doing this

reply

upvote

by 20 hours ago|

[-]

deleted

reply

upvote

by andersmurphy22 hours ago|

[-]

This a thousand times. The bigger models also have a habit of overcomplicating things.

reply

upvote

by warmwaffles1 days ago|

[-]

> Don't ask LLMs for big changes

> Review everything and point them in the right direction

Sorry upper management doesn't care. That's an engineering problem that you need to solve.

reply

upvote

by eikenberry1 days ago|

[-]

They were proposing a solution.. To use flash models and use them in a way that best amplifies your work.

reply

upvote

by AgentMasterRace23 hours ago|

[-]

He was making a joke.

reply

upvote

by warmwaffles2 hours ago|

[-]

Indeed I was. But that's lost on people here.

reply

upvote

by epolanski20 hours ago|

[-]

I'm legit annoyed at opus 4.8 at any setting above 4.8.

I believe it can be great for vibe coding, but mundane day work? Hell no, I'd rather work with Haiku. It's too slow, checks too many things, it's annoying as hell.

reply