(* explained at https://news.ycombinator.com/item?id=26998308)
The power dynamics are also vastly against me. I represent a fraction of my employer's labour, but my employer represents 100% of my income.
That dynamic is totally inverted with AI. You are a rounding error on their revenue sheet, they have a monopoly on your work throughput. How do you budget an workforce that could turn 20% more expensive overnight?
If you're talking about output quality, then yeah, that's not as easy. But for product outputs (building a customer service agent or something like that), having a well-designed eval harness and doing testing and iteration can get you some degree of convergence between the models of similar generations. Coding is similar (iterate, measure), but less easy to eval.
It is transferable-yes, you will get issues if you take prompts and workflows tuned for one model and send them to another unchanged. But, most of the time, fixing it is just tinkering with some prompt templates
People port solutions between models all the time. It takes some work, but the amount of work involved is tractable
Plus: this is absolutely the kind of task a coding agent can accelerate
The biggest risk is if your solution is at the frontier of capability, and a competing model (even another frontier model) just can’t do it. But a lot of use cases, that isn’t the case. And even if that is the case today, decent odds in a few more months it won’t be
This is why there are a ton of corps running the open source models in house... Known costs, known performance, upgrade as you see fit. The consumer backlash against 4o was noted by a few orgs, and they saw the writing on the wall... they didnt want to develop against a platform built on quicksand (see openweb, apps on Facebook and a host of other examples).
There are people out there making smart AI business decisions, to have control over performance and costs.
If you've got something to share I'd love to see it.
This is an architecture that people are increasing begging to give network connectivity that can't differentiate its system prompt from user input
I'd also flip your framing on its head. One of the advantages of human labor over agents is accountability. Someone needs to own the work at the end of the day, and the incentive alignment is stronger for humans given that there is a real cost to being fired.
I think we're reaching the point where more developers need to start right-sizing the model and effort level to the task. It was easy to get comfortable with using the best model at the highest setting for everything for a while, but as the models continue to scale and reasoning token budgets grow, that's no longer a safe default unless you have unlimited budgets.
I welcome the idea of having multiple points on this curve that I can choose from. depending on the task. I'd welcome an option to have an even larger model that I could pull out for complex and important tasks, even if I had to let it run for 60 minutes in the background and made my entire 5-hour token quota disappear in one question.
I know not everyone wants this mental overhead, though. I predict we'll see more attempts at smart routing to different models depending on the task, along with the predictable complaints from everyone when the results are less than predictable.
For a while I used Cerebras Code for 50 USD a month with them running a GLM model and giving you millions of tokens per day. It did a lot of heavy lifting in a software migration I was doing at the time (and made it DOABLE in the first place), BUT there were about 10 different places where the migration got fucked up and had to manually be fixed - files left over after refactoring (what's worse, duplicated ones basically), some constants and routes that are dead code, some development pages that weren't removed when they were superseded by others and so on.
I would say that Claude Code with throwing Opus at most problems (and it using Sonnet or Haiku for sub-agents for simple and well specified tasks) is actually way better, simply because it fucks things up less often and review iterations at least catch when things are going wrong like that. Worse models (and pretty much every one that I can afford to launch locally, even ones that need around ~80 GB of VRAM in the context of an org wanting to self-host stuff) will be confidently wrong and place time bombs in your codebases that you won't even be aware of if you don't pay enough attention to everything - even when the task was rote bullshit that any model worth its salt should have resolved with 0 issues.
My fear is that models that would let me truly be as productive as I want with any degree of confidence might be Mythos tier and the economics of that just wouldn't work out.
For handing work off to an LLM in large chunks, picking the best model available is the only way to go right now.
I’m curious how to even do it. I have no idea how to choose which model to use in advance of a given task, regardless of the mental overhead.
And unless you can predict perfectly what you need, there’s going to be some overuse due to choosing the wrong model and having to redo some work with a better model, I assume?
Even EMs and TPMs are assigning people based on their previous experience, which generally boils down to "i've seen this task before and I know what's involved," "this task is small, and I know what's involved," or "this task is too big and needs to be understood better."
That's how things worked pre-AI, and old problems are new problems again.
When you run any bigger project, you have senior folks who tackle hardest parts of it, experienced folks who can churn out massive amounts of code, junior folks who target smaller/simpler/better scoped problems, etc.
We don't default to tell the most senior engineer "you solve all of those problems". But they're often involved in evaluation/scoping down/breakdown of problem/supervising/correcting/etc.
There's tons of analogies and decades of industry experience to apply here.
I'm not saying that can't be done, but taking a large task that hasn't been broken down needs, you guessed it, a powerful agent. that's your senior engineer who can figure out the rote parts, the medium parts, and the thorny parts.
the goal isn't to have an engineer do that. we should still be throwing powerful agents at a problem, they should just be delegating the work more efficiently.
throwing either an engineer or an agent at any unexplored work means you just have to delegate the most experienced resource to, or suffer the consequences.
So there's a push for them to increase revenue per user, which brings us closer to the real cost of running these models.
At that point you are beholden to your shareholders and no longer can eschew profit in favor of ethics.
Unfortunately, I think this is the beginning of the end of Anthropic and Modei being a company and CEO you could actually get behind and believe that they were trying to do "the right thing".
It will become an increasingly more cutthroat competition between Anthropic and OpenAI (and perhaps Google eventually if they can close the gap between their frontier models and Claude/GPT) to win market share and revenue.
Perhaps Amodei will eventually leave Anthropic too and start yet another AI startup because of Anthropic's seemingly inevitable prioritization of profit over safety.
Just how if Boeing was able to release a supersonic plane that was also twice as efficient tomorrow; it'd destroy any airline that was deep in debt for its current "now worthless" planes.
No not really, you can issue two types of shares, the company founders can control a type of shares which has more voting power while other shareholders can get a different type of shares with less voting power.
Facebook, Google has something similar.
A publicly traded company is legally obligated to go against the global good.
Call me an optimist, but I'm still holding out hope that Amodei is and still can do the right thing. That hope is fading fast though.
So no matter what, if you do something lots of people like (and hence compensate you for), you will be evil.
It's a very interesting quirk of human intuition.
Can't blame someone who comes to such a conclusion about money and power.
Yet here they are, often considered on of the most evil companies on Earth. That's the interesting quirk.
Can you explain what you mean by this? I disagree but I don't understand how you think Google did this so I am very curious.
For my part, I started using the internet before Google, and I strongly hold the opinion that Google's greatest contribution to the internet was utterly destroying its peer to peer, free, open exchange model by being the largest proponent of centralizing and corporatizing the web.
Surely you have to recognize the inconsistency of saying that Google "corporatized" the web, while the vast majority of people using google have never paid them anything. In fact many don't even load their ads or trackers, and still main a gmail account.
If we put on balance good things and evil things google has done, with honest intention, I struggle very hard to counter "gave the third world a full suite of computer programs and access to endless video knowledge for free with nothing more than dumpy hardware", while the evil is "conspired with credit card companies to find out what you are buying".
This might come off like I am just glazing google. But the point I am trying to illuminate is that when there is big money at play, people knee-jerk associate it with evil, and throw all nuance out the window.
Besides, IRC still exists for you and anyone else to use. Totally google free.
There’s several subjects to go into here and HN probably isn’t the best place for the amount of detail this discussion requires but I will just note the amount of people blocking Google’s ads and trackers is negligible and has significantly shrunk in the mobile first era.
The wave is shifting to other corporations now but for a good while most of the internet was architected to give Google money. Remember SEO? An entire practice of web publishing centered around Google’s profit share. That hasn’t disappeared- it’s just evolved and transformed into more ingrained rent-seeking.
I was about to call it reselling but so many startups with their fingers in the tech startup pie offer containerised cloud compute akin to a loss leader. Harking back to the old days of buying clock time on a mainframe except you're getting it for free for a while.
Like, Apple computers are already quite pricey -- $1000 or $2000 or so for a decent one. But you can spec up one that’s a bit better (not really that much better) and they’ll charge you $10K, $20K, $30K. Some customers want that and many are willing to pay for it.
Is there an equivalent ultra-high-end LLM you can have if you’re willing to pay? Or does it not exist because it would cost too much to train?
I guess at the time that was GPT-4.5. I don't think people used it a lot because it was crazy expensive, and not that much better than the rest of the crop.
So, for agentic workflows - ones where the model gets feedback from tools, etc…, fast enough is important.
I'd rather be surprised if they are still doing business by then.
I’m guessing we’re gonna have a world like working on cars - most people won’t have expensive tools (ex a full hydraulic lift) for personal stuff, they are gonna have to make do with lesser tools.
i bought a $3k AMD395+ under the Sam Altman price hike and its got a local model that readily accomplishes medial tasks.
theres a ceiling to these price hikes because open weights will keep popping up as competitors tey to advertise their wares.
sure, we POV different capabilities but theres definitely not that much cash in propfietary models for their indererminance
Or they are just not willing to burn obscene levels of capital like OpenAI.