undefined

points

[-]

Tokens are insanely cheap at the moment. Through OpenRouter a message to Sonnet costs about $0.001 cents or using Devstral 2512 it's about $0.0001. An extended coding session/feature expansion will cost me about $5 in credits. Split up your codebase so you don't have to feed all of it into the LLM at once and it's a very reasonable.

by lebovic1 days ago|

parent|

[-]

It cost me ~$750 to find a tricky privilege escalation bug in a complex codebase where I knew the rough specs but didn't have the exploit. There are certainly still many other bugs like that in the codebase, and it would cost $100k-$1MM to explore the rest of the system that deeply with models at or above the capability of Opus 4.6.

It's definitely possible to do a basic pass for much less (I do this with autopen.dev), but it is still very expensive to exhaustively find the harder vulnerabilities.

by christophilus19 hours ago|

parent|

[-]

This is where the Codex and Claude Code Pro/Max plans are excellent. I rarely run into the limits of Codex. If I do, I wait and come back and have it resume once the window has expired.

by Jcampuzano218 hours ago|

parent|

[-]

Claude and Codex pro/max subs aren't supposed to be used for commercial/enterprise development so its not really an option for execs in enterprise. They need to take into account API costs.

At my F500 company execs are very wary of the costs of most of these tools and its always top of mind. We have dashboards and gather tons of internal metrics on which tools devs are using and how much they are costing.

by christophilus14 hours ago|

parent|

[-]

No, I think that’s wrong. They aren’t supposed to be put behind a service, but they can certainly be used to write professional products/ products for the enterprise.

by 12 hours ago|

parent|

[-]

deleted

by otterley18 hours ago|

parent|

prev|

[-]

Are they also measuring productivity? Measuring only token costs is like looking only at grocery spend but not the full receipt: you don’t know whether you fed your family for a week or for only a day.

by Jcampuzano212 hours ago|

parent|

[-]

I'm not one of those execs, I'm just echoing what they tell us from those I've talked to who manage these dashboards and worry about this. I do think measuring productivity is not very clear-cut especially with these tools.

They do "attempt" to measure productivity. But they also just see large dollar amounts on AI costs and get wary.

My company is also wary of going all in with any one tool or company due to how quickly stuff changes. So far they've been trying to pool our costs across all tools together and give us an "honor system" limit we should try not to go above per month until we do commit to one suite of tools.

by batshit_beaver16 hours ago|

parent|

prev|

[-]

First you have to figure out HOW to measure productivity.

by otterley13 hours ago|

parent|

[-]

(Output / input), both of which are usually measured in money. If you can measure both of those things--and you have bigger problems if your finance department can't--it logically follows that you can measure productivity.

by Jcampuzano212 hours ago|

parent|

[-]

Measuring strictly in terms of money per unit time over a small enough timeframe is difficult because not all tasks directly result in immediately observed results.

There are tasks worked on at large enterprises that have 5+ year horizons, and those can't all immediately be tracked in terms of monetary gain that can be correlated with AI usage. We've barely even had AI as a daily tool used for development for a few years.

by petesergeant18 hours ago|

parent|

prev|

[-]

> Claude and Codex pro/max subs aren't supposed to be used for commercial/enterprise development

lolwut?

by blks16 hours ago|

parent|

[-]

Read ToS.

by monocularvision15 hours ago|

parent|

[-]

I just did. Tell me where it states what you are claiming. Neither my reading (IANAL) nor ChatGPT’s reading could find such a blanket ban:

https://www.anthropic.com/legal/consumer-terms

by watermelon014 hours ago|

parent|

[-]

From your link:

> Non-commercial use only. You agree that you will not use our Services for any commercial or business purposes and we and our Providers have no liability to you for any loss of profit, loss of business, business interruption, or loss of business opportunity.

There are separate commercial terms for Team/Enterprise/API usage: https://www.anthropic.com/legal/commercial-terms

by fasterik14 hours ago|

parent|

[-]

I suspect you are accessing their website from a European IP address. The clause you quoted is not present for users outside of the EU/UK.

https://news.ycombinator.com/item?id=47590473

by monocularvision11 hours ago|

parent|

[-]

That explains it. I don’t see it from my US IP address.

by otterley18 hours ago|

parent|

prev|

[-]

How much would it have cost a human to do the same work? The question isn’t how much tokens cost; the question is how much money is saved by using AI to do it.

by kemotep9 hours ago|

parent|

[-]

Does the person prompting the AI work for free?

by otterley8 hours ago|

parent|

[-]

Let's assume they don't.

by skeledrew18 hours ago|

parent|

prev|

[-]

Compare to the cost when said vulnerabilities are exploited by bad actors in critical systems. Worth it yet?

by zozbot23417 hours ago|

parent|

prev|

[-]

Agentic tasks use up a huge amount of tokens compared to simple chatting. Every elementary interaction the model has with the outside world (even while doing something as simple as reading code from a large codebase) is a separate "chat" message and "response", and these add up very quickly.

by gmerc1 days ago|

parent|

prev|

[-]

You’d have to ignore the massive investor ROI expectations or somehow have no capability to look past “at the moment”.

by NitpickLawyer21 hours ago|

parent|

[-]

That might be a problem for the labs (although I don't think it is) but it's not a problem for end-users. There is enough pressure from top labs competing with each other, and even more pressure from open models that should keep prices at a reasonable price point going further.

In order to justify higher prices the SotA needs to have way higher capabilities than the competition (hence justifying the price) and at the same time the competition needs to be way below a certain threshold. Once that threshold becomes "good enough for task x", the higher price doesn't make sense anymore.

While there is some provider retention today, it will be harder to have once everyone offers kinda sorta the same capabilities. Changing an API provider might even be transparent for most users and they wouldn't care.

If you want to have an idea about token prices today you can check the median for serving open models on openrouter or similar platforms. You'll get a "napkin math" estimate for what it costs to serve a model of a certain size today. As long as models don't go oom higher than today's largest models, API pricing seems in line with a modest profit (so it shouldn't be subsidised, and it should drop with tech progress). Another benefit for open models is that once they're released, that capability remains there. The models can't get "worse".

by KetoManx641 days ago|

parent|

prev|

[-]

Not really. I'm fully taking advantage of these low prices while they last. Eventually the AI companies will run start running out of funny money and start charging what the models actually cost to run, then I just switch over to using the self hosted models more often and utilize the online ones for the projects that need the extra resources. Currently there's no reason for why I shouldn't use Claude Sonnet to write one time bash scripts, once it starts costing me a dollar to do so I'm going to change my behavior.

by deaux1 days ago|

parent|

[-]

> Currently there's no reason for why I shouldn't use Claude Sonnet to write one time bash scripts, once it starts costing me a dollar to do so I'm going to change my behavior.

This just isn't going to happen, we have open weights models which we can roughly calculate how much they cost to run that are on the level of Sonnet _right now_. The best open weights models used to be 2 generations behind, then they were 1 generation behind, now they're on par with the mid-tier frontier models. You can choose among many different Kimi K2.5 providers. If you believe that every single one of those is running at 50% subsidies, be my guest.

by skeledrew17 hours ago|

parent|

prev|

[-]

> start charging what the models actually cost to run

The political climate won't allow that to happen. The US will do everything to stay ahead of China, and a rise in prices means a sizeable migration to Chinese models, giving them that much more data to improve their models and pass the US in AI capability (if they haven't already).

But also it'll happen in a way, as eventually models will become optimized enough that run cost become more or less negligible from a sustainability perspective.

by twosdai1 days ago|

parent|

prev|

[-]

I also have this feeling. But do you ever doubt it. that when the time comes we will be like the boiled frog? Where its "just so convenient" or that the reality of setting up a local ai is just a worse experience for a large upfront cost?

by iririririr1 days ago|

parent|

[-]

worse. he's already boiled. probably paying way more than that one dollar per bash script with all the subscriptions he already has.

by KetoManx641 days ago|

parent|

[-]

Yeah, the $20 I paid to OpenRouter about 4 months ago really cost me an arm and a leg, not sure where I'll get my next meal if I'm to be honest.

by ThePowerOfFuet23 hours ago|

parent|

prev|

[-]

>$0.001 cents

$0.001 (1/10 of a cent) or 0.001 cents (1/1000 of a cent, or $0.00001)?

by Pikamander219 hours ago|

parent|

[-]

Oh no, here we go again

https://youtube.com/watch?v=MShv_74FNWU

by NitpickLawyer22 hours ago|

prev|

[-]

Tokens aren't more expensive than highly trained meatbags today. There's no way they'll be more expensive "tomorrow"...

by bigbugbag20 hours ago|

parent|

[-]

[flagged]

by skeledrew18 hours ago|

parent|

[-]

> they are and they will be

Calculate the approximate cost of raising a human from birth to having the knowledge and skills to do X, along with maintenance required to continue doing X. Multiply by a reasonable scaling factor in comparison to one of today's best LLMs (ie how many humans and how much time to do Xn, vs the LLM).

Calculate the cost of hardware (from raw elements), training and maintenance for said LLM (if you want to include the cost of research+software then you'll have to also include the costs of raising those who taught, mentored, etc the human as well). Consider that the human usually specializes, while the LLM touches everything. I think you'll find even a roughly approximate answer very enlightening if you're honest in your calculations.

by Synthetic734615 hours ago|

parent|

[-]

But companies don't have to bear the cost of raising a human from birth, or training them. They only pay the cost of hiring them, and that includes cost of maintenence.

Add to that the fact that we can't blindly trust LLM output just yet, so we need a mearbag to review it.

LLM will always be more expensive than human +LLM, until we're at a stage where we can remove the human from the loop

by PunchyHamster11 hours ago|

parent|

prev|

[-]

the crash would mean price of GPUs would go down, not up...

by qingcharles16 hours ago|

prev|

[-]

I'm thinking about how much money Anthropic etc are making from intelligence services who are running Opus 4.6 on ultra high settings 24 hours a day to find these kinds of exploits and take advantage of them before others do.

Expensive for me and you, but peanuts for a nation state.

by epolanski22 hours ago|

prev|

[-]

I don't buy it.

Inference cost has dropped 300x in 3 years, no reason to think this won't keep happening with improvements on models, agent architecture and hardware.

Also, too many people are fixated with American models when Chinese ones deliver similar quality often at fraction of a cost.

From my tests, "personality" of an LLM, it's tendency to stick to prompts and not derail far outweights the low % digit of delta in benchmark performance.

Not to mention, different LLMs perform better at different tasks, and they are all particularly sensible to prompts and instructions.