Haiku is most definitely not fine for the code bases that I work on. Sonnet is probably fine for most daily tasks, but Opus is still needed to find that pesky bug you've been chasing, or to thoroughly review your PR.
You give it 3 examples of the change you want, then ask it to do the other 87. You'll end up saving time and “money”.
Yeah, I hear that a lot, but it never comes with proof. Everyone is special.
I’m sure you’d find that Haiku is pretty functional if there were a constraint on your use.
I don't know how anyone could believe that Haiku is useful for most engineering tasks. I often try to have it take on small tasks in the codebase with well defined boundaries to try to conserve my plan limits, but half the time I end up disappointed and feeling like I wasted more time than I should have.
The differences between the models is vast. I'm not even sure how you could conclude that Haiku is usable for most work, unless you have a very different type of workload than what I work on.
Most importantly, define your acceptance criteria. What do you mean by “disappointed” - this word is doing most of the heavy lifting in your anecdote. (i.e. I know plenty of coders who are “disappointed” by any code that they didn’t personally write, and become reflexively snobby about LLM code quality. Not saying that’s you, but I can’t rule it out, either.)
The models are not the same, but Haiku is definitely not useless, and without a lot more detail, I just ignore anecdotal statements with this sort of hyperbole. Just to illustrate the larger point, I find something wrong with nearly everything Haiku writes, but then again, I don’t expect perfection. I’d probably get a “better” end result for most individual runs with the more expensive models, but at vastly higher cost that doesn’t justify the difference.
But I'm not vibecoding, I don't let models do large work or refactorings, this is just for some small boring tasks I don't want to do.
Maybe, just maybe, the tool isn't suitable for all problem spaces.
I’m not saying that. If anything, it really doesn’t matter much what model you use, and it’s only a case of “you’re holding it wrong” in the sense that you have to use your brain to write code, and that if you outsource your thinking to a machine, that’s the fundamental mistake.
In other words, it’s a tool, not a magic wand. So yeah, you do have to understand how to use it, but in a fairly deterministic way, not in a mysterious woo-woo way.
You were the one who made the claim that Haiku is fine most of the time. To any reasonable person, the burden of proof is on you. Maybe you should share some high level details about your codebase, like its stack, size, problem domain, and so on? Maybe they are so generic that Haiku indeed does fine for you.
I basically never just yolo large code changes, and use my taste and experience to guide the tools along. For this, Haiku is perfectly fine in nearly all circumstances.
> I’ve been using Anthropic models exclusively for the last month on a large, realistic codebase, and I can count the number of times I needed to use Opus on one hand. Most of the time, Haiku is fine. About 10% of the time I splurge for Sonnet, and honestly, even some of those are unnecessary.
You and I couldn't have more different experiences. Opus 4.7 on the max setting still gets lost and chokes on a lot of my tasks.
I switch to Sonnet for simpler tasks like refactoring where I can lay out all of the expectations in detail, but even with Opus 4.7 I can often go through my entire 5-hour credit limit just trying to get it to converge on a reasonable plan. This is in a medium size codebase.
For the people putting together simple web apps using Sonnet with a mix of Haiku might be fine, but we have a long way to go with LLMs before even the SOTA models are trustworthy for complex tasks.
I have never had the situation you describe, where Opus won’t come up with “a reasonable plan”, but your definition of “reasonable” might be very different than mine, and of course, running through your credit limit is an entirely tangential problem.
While I agree with the sentiment, I think that might have been initially driven by older models being nerfed and/or newer ones were better at token/$. And there is this notion that those labs don't constraint the model on the first days after its release.
- If you pay for unlimited trips will you choose the Ferrari or the old VW? Both are waiting outside your door, ready to go.
- Providers that let you choose models don't really price much difference between lower class models. On my grandfathered Cursor plan I pay 1x request to use Composer 2 or 2x request to use Opus 4.6. Until the price is more differentiated so people can say "ok yes Opus is smarter, but paying 10x more when Haiku would do the same isn't worth it" it won't happen.
Obviously we’re a long way away from being able to rationally evaluate whether the value of X tokens in model Y is better than model Z, let alone better in terms of developer cost, but that’s kind of where we need to get to, otherwise the model providers are selling magic beans rated in ineffable units of magicalness. The only rational behavior in such a world is to gorge yourself.
Remember that it's not only the cost per token, but also speed. Some tasks are done faster with simpler/less-thinking models, so it might actually make sense to micromanage the model when you have deadlines.
From a business perspective, why would I start thinking about which model to use, when I could cheaply always use the best model?
I mean at some point some people learn...
I was doing Opus for nasty stuff or otherwise at most planning and then using Sonnet to execute.
Buuuuut I'm dealing with a lot of nonstandard use cases and/or sloppy codebases.
Also, at work, Haiku isn't an enabled model.
But also, if I or my employer are paying for premium requests, then they should be served appropriately.
As it stands this announcement smells of "We know our pricing was predatory and here is the rug pull."
My other lesser worry isn't that Opus 4.7 has a 7.5x multi, it's that the multiplier is quoted as an 'introductory' rate.