undefined

upvote

points

by timr12 hours ago |

upvote

by YmiYugy3 hours ago|

[-]

Of course you don't NEED the better models, but figuring out what model you need can waste a lot of time and effort. Even when a cheap model is capable of a task it needs a lot more guidance than a more expensive one. They are also less reliable. You can waste a lot of time cleaning up after them. Judging whether something is good enough is hard work and rerolling with a more expensive model is painful. Judging the difficulty of a task ahead of time is very hard. Judging how good a model is for a given task even harder, especially when models and harnesses keep changing all the time. The real productivity boost LLMs provide is already modest and when you start tinkering with models it can easily evaporate.

reply

upvote

by selcuka12 hours ago|

[-]

> Most of the time, Haiku is fine.

Haiku is most definitely not fine for the code bases that I work on. Sonnet is probably fine for most daily tasks, but Opus is still needed to find that pesky bug you've been chasing, or to thoroughly review your PR.

reply

upvote

by demorro2 hours ago|

[-]

Most of the people using these models aren't skilled enough to make that determination. Seems rough trying to sell yourself as the thing that means you don't need to understand what you're doing but also insist that you understand what you're doing well enough to select an appropriate model.

reply

upvote

by ncruces3 hours ago|

[-]

I think Haiku is fine (e.g.) for any task that you could almost, but not quite, complete with (regex?) find and replace.

You give it 3 examples of the change you want, then ask it to do the other 87. You'll end up saving time and “money”.

reply

upvote

by timr12 hours ago|

[-]

> Haiku is most definitely not fine for the code bases that I work on. Sonnet is probably fine for most daily tasks, but Opus is still needed to find that pesky bug you've been chasing, or to thoroughly review your PR.

Yeah, I hear that a lot, but it never comes with proof. Everyone is special.

I’m sure you’d find that Haiku is pretty functional if there were a constraint on your use.

reply

upvote

by Aurornis8 hours ago|

[-]

I use models from Opus through Haiku and down into Qwen locally hosted models.

I don't know how anyone could believe that Haiku is useful for most engineering tasks. I often try to have it take on small tasks in the codebase with well defined boundaries to try to conserve my plan limits, but half the time I end up disappointed and feeling like I wasted more time than I should have.

The differences between the models is vast. I'm not even sure how you could conclude that Haiku is usable for most work, unless you have a very different type of workload than what I work on.

reply

upvote

by timr7 hours ago|

[-]

More information required. What are you working on? What languages? How do you define “small tasks”? What are “well-defined boundaries”? What is your workflow?

Most importantly, define your acceptance criteria. What do you mean by “disappointed” - this word is doing most of the heavy lifting in your anecdote. (i.e. I know plenty of coders who are “disappointed” by any code that they didn’t personally write, and become reflexively snobby about LLM code quality. Not saying that’s you, but I can’t rule it out, either.)

The models are not the same, but Haiku is definitely not useless, and without a lot more detail, I just ignore anecdotal statements with this sort of hyperbole. Just to illustrate the larger point, I find something wrong with nearly everything Haiku writes, but then again, I don’t expect perfection. I’d probably get a “better” end result for most individual runs with the more expensive models, but at vastly higher cost that doesn’t justify the difference.

reply

upvote

by krzyk3 hours ago|

[-]

I use Haiku frequently, and for my codebase it is working fine.

But I'm not vibecoding, I don't let models do large work or refactorings, this is just for some small boring tasks I don't want to do.

reply

upvote

by zdragnar11 hours ago|

[-]

I don't think it's really helpful to tell people they're holding it wrong, especially when you hear the problem a lot.

Maybe, just maybe, the tool isn't suitable for all problem spaces.

reply

upvote

by timr11 hours ago|

[-]

> I don't think it's really helpful to tell people they're holding it wrong

I’m not saying that. If anything, it really doesn’t matter much what model you use, and it’s only a case of “you’re holding it wrong” in the sense that you have to use your brain to write code, and that if you outsource your thinking to a machine, that’s the fundamental mistake.

In other words, it’s a tool, not a magic wand. So yeah, you do have to understand how to use it, but in a fairly deterministic way, not in a mysterious woo-woo way.

reply

upvote

by macintux9 hours ago|

[-]

“Everyone is special” is a snarky, derogatory comment we don’t need here.

reply

upvote

by timr9 hours ago|

[-]

It’s not snarky. It’s literally the argument people are making: I am special, my use case is exceptional, therefore I need to use the special tool, even if you don’t need to.

reply

upvote

by enraged_camel3 hours ago|

[-]

>> Yeah, I hear that a lot, but it never comes with proof. Everyone is special.

You were the one who made the claim that Haiku is fine most of the time. To any reasonable person, the burden of proof is on you. Maybe you should share some high level details about your codebase, like its stack, size, problem domain, and so on? Maybe they are so generic that Haiku indeed does fine for you.

reply

upvote

by 10 hours ago|

[-]

deleted

reply

upvote

by anabis11 hours ago|

[-]

AI should decide the level of model needed, and fallback if it fails. It mostly is a UX problem. Why do I need to specify the level of model beforehand? Many problems don't allow decision pre-implementation.

reply

upvote

by jeremyjh10 hours ago|

[-]

This is the approach of Auto in Cursor and I've not been impressed with it at all. I think I'm always getting Composer and while its fast it wastes my time. GLM 5.1 in OpenCode is far better and less expensive, it can do planning and implementation both very effectively. Opus is still the best but GPT 5.4 (in Codex) is good enough too, and way more affordable.

reply

upvote

by Vegenoid9 hours ago|

[-]

This would require LLMs being good at knowing when they are doing a bad job, which they are still terrible at. With a good testing and verification harness set up, sure, then it could just go to a more powerful model if it can't make tests pass. But not a lot of usage is like this.

reply

upvote

by YmiYugy3 hours ago|

[-]

Because judging failure is itself a complex task requiring a potentially expensive model.

reply

upvote

by timr10 hours ago|

[-]

That’s certainly an opinion. Not one I agree with, but sure, if you entirely outsource all of your thinking to the magic box, then you probably want the box to have the strongest possible magic.

reply

upvote

by koonsolo3 hours ago|

[-]

At the current cost, I just use the best model all the time. Why wouldn't I?

reply

upvote

by p1necone11 hours ago|

[-]

I think it heavily depends on how you're using it. If you understand your codebase and you're using it like "build a function that does x in y file" then smaller/cheaper models are great. But if you're saying "hey build this relatively complex feature following the 30,000 foot view spec in this markdown doc" then Haiku doesn't work (unless your "complex feature" is just an api endpoint and some UI that consumes it).

reply

upvote

by timr11 hours ago|

[-]

I largely agree. But that goes back to my point (albeit with mixed metaphors): there are lots of people who are just hitting things with a jackhammer in lieu of understanding how to properly use a hammer.

I basically never just yolo large code changes, and use my taste and experience to guide the tools along. For this, Haiku is perfectly fine in nearly all circumstances.

reply

upvote

by Aurornis8 hours ago|

[-]

> people are just using the latest and most expensive models because they can, and because they’re cargo-culting. This is perhaps the first time that software has had this kind of problem, and coders are not exactly demonstrating great discretionary decision making.

> I’ve been using Anthropic models exclusively for the last month on a large, realistic codebase, and I can count the number of times I needed to use Opus on one hand. Most of the time, Haiku is fine. About 10% of the time I splurge for Sonnet, and honestly, even some of those are unnecessary.

You and I couldn't have more different experiences. Opus 4.7 on the max setting still gets lost and chokes on a lot of my tasks.

I switch to Sonnet for simpler tasks like refactoring where I can lay out all of the expectations in detail, but even with Opus 4.7 I can often go through my entire 5-hour credit limit just trying to get it to converge on a reasonable plan. This is in a medium size codebase.

For the people putting together simple web apps using Sonnet with a mix of Haiku might be fine, but we have a long way to go with LLMs before even the SOTA models are trustworthy for complex tasks.

reply

upvote

by timr6 hours ago|

[-]

I don’t use Haiku for planning of big tasks, so we basically agree on that. But even just Sonnet 4.6, on a fairly large codebase, only truly goes into the weeds maybe 10% of the time for me. I also write pretty specific initial prompts, and have a good idea of how I want the code to work before I start prompting. For example, sometimes I will spend several hours writing a spec before even picking up the power tools.

I have never had the situation you describe, where Opus won’t come up with “a reasonable plan”, but your definition of “reasonable” might be very different than mine, and of course, running through your credit limit is an entirely tangential problem.

reply

upvote

by adonese4 hours ago|

[-]

>people are just using the latest and most expensive models because they can,

While I agree with the sentiment, I think that might have been initially driven by older models being nerfed and/or newer ones were better at token/$. And there is this notion that those labs don't constraint the model on the first days after its release.

reply

upvote

by fy208 hours ago|

[-]

I think the reason is two fold:

- If you pay for unlimited trips will you choose the Ferrari or the old VW? Both are waiting outside your door, ready to go.

- Providers that let you choose models don't really price much difference between lower class models. On my grandfathered Cursor plan I pay 1x request to use Composer 2 or 2x request to use Opus 4.6. Until the price is more differentiated so people can say "ok yes Opus is smarter, but paying 10x more when Haiku would do the same isn't worth it" it won't happen.

reply

upvote

by timr6 hours ago|

[-]

Agreed on both points. We’re dealing with a cost/benefit analysis, and to this point, coders have been subsidized, coerced…maybe even mandated into using the most expensive option as if it was a limitless resource. Clearly not true, and so of course we’re going to see nerfing of the tools over time.

Obviously we’re a long way away from being able to rationally evaluate whether the value of X tokens in model Y is better than model Z, let alone better in terms of developer cost, but that’s kind of where we need to get to, otherwise the model providers are selling magic beans rated in ineffable units of magicalness. The only rational behavior in such a world is to gorge yourself.

reply

upvote

by computerex10 hours ago|

[-]

Model selection for day to day tasks based on vibes is not very scientific. Micromanaging the model doesn't seem like a great idea when doing real professional work with professional goals/deadlines/pressures.

reply

upvote

by selcuka10 hours ago|

[-]

> Micromanaging the model doesn't seem like a great idea when doing real professional work with professional goals/deadlines/pressures.

Remember that it's not only the cost per token, but also speed. Some tasks are done faster with simpler/less-thinking models, so it might actually make sense to micromanage the model when you have deadlines.

reply

upvote

by computerex8 hours ago|

[-]

If you're using the models to generate 99%-100% of the code, then it doesn't make sense to plug yourself into the loop as a bottleneck.

reply

upvote

by timr10 hours ago|

[-]

It’s deeply ironic that the folks who want to outsource as much thought to the model as possible are saying that my stance - use your brain to decide the right tool for the job - is tantamount to “vibes”.

reply

upvote

by computerex8 hours ago|

[-]

You are being deeply reductive and that's against the spirit of hacker news. The issue is that models are difficult to objectively benchmark. The benchmarks don't always align with real world performance. It's not easy and clear cut to determine which model will work best in a given situation. It boils down to loose experiences/anecdotes. Do you have an objective criteria for model selection that you have tested to be effective with reproducible tests?

reply

upvote

by CGamesPlay6 hours ago|

[-]

Claude Code doesn't have an option to use Opus 4.6 any more for me. It was great, but I guess now I have to use it half as much or upgrade my subscription again.

reply

upvote

by koonsolo3 hours ago|

[-]

> coders are not exactly demonstrating great discretionary decision making.

From a business perspective, why would I start thinking about which model to use, when I could cheaply always use the best model?

reply

upvote

by to11mtm12 hours ago|

[-]

> I’ve been using Anthropic models exclusively for the last month on a large, realistic codebase, and I can count the number of times I needed to use Opus on one hand. Most of the time, Haiku is fine. About 10% of the time I splurge for Sonnet, and honestly, even some of those are unnecessary.

I mean at some point some people learn...

I was doing Opus for nasty stuff or otherwise at most planning and then using Sonnet to execute.

Buuuuut I'm dealing with a lot of nonstandard use cases and/or sloppy codebases.

Also, at work, Haiku isn't an enabled model.

But also, if I or my employer are paying for premium requests, then they should be served appropriately.

As it stands this announcement smells of "We know our pricing was predatory and here is the rug pull."

My other lesser worry isn't that Opus 4.7 has a 7.5x multi, it's that the multiplier is quoted as an 'introductory' rate.

reply

upvote

by polski-g7 hours ago|

[-]

85% of my code tasking can be handled by either GLM or Sonnet. The truth of the matter is that most software isn't that complicated. Even more hilarious is that people were running Opus on their OpenClaw setups. I'm glad Anthropic kicked them to the curb.

reply

upvote

by 8 hours ago|

[-]

deleted

reply

upvote

by ThunderSizzle12 hours ago|

[-]

Haiku is complete crap compared to sonnet in GHCP. A basic task in Haiku takes 3 prompts with a lot of correction. 1 prompt in sonnet. It isn't worth a third of the price if I have to fix it twice.

reply

upvote

by esafak10 hours ago|

[-]

It is not that simple; companies retire old models. I wanted to use 5.1 Codex Max to save money and I could not on my subscription.

reply