undefined

points

[-]

keep in mind that people who point out a regression and measure the actual #tok, which costs $money, aren't just "being loud" — someone diffed session context usaage and found 4.6 burning >7x the amount of context on a task that 4.5 did in under 2 MB⁣.

by svachalek3 hours ago|

parent|

[-]

It's not that they don't have a point, it's that everyone who's finding 4.6 to be fine or great are not running out to the internet to talk about it.

by marcus_cemes2 hours ago|

parent|

[-]

Being a moderately frequent user of Opus and having spoken to people who use it actively at work for automation, it's a really expensive model to run, I've heard it burn through a company's weekend's credit allocation before Saturday morning, I think using almost an order of magnitude more tokens is a valid consumer concern!

I have yet to hear anyone say "Opus is really good value for money, a real good economic choice for us". It seems that we're trying to retrofit every possible task with SOTA AI that is still severely lacking in solid reasoning, reliability/dependability, so we throw more money at the problem (cough Opus) in the hopes that it will surpass that barrier of trust.

by SatvikBeri4 hours ago|

prev|

[-]

I've also seen Opus 4.6 as a pure upgrade. In particular, it's noticeably better at debugging complex issues and navigating our internal/custom framework.

by drcongo4 hours ago|

parent|

[-]

Same here. 4.6 has been considerably more dilligent for me.

by AustinDev3 hours ago|

parent|

[-]

Likewise, I feel like it's degraded in performance a bit over the last couple weeks but that's just vibes. They surely vary thinking tokens based on load on the backend, especially for subscription users.

When my subscription 4.6 is flagging I'll switch over to Corporate API version and run the same prompts and get a noticeably better solution. In the end it's hard to compare nondeterministic systems.

by merlindru7 minutes ago|

parent|

[-]

That's very interesting!

Also, +1. Opus 4.6 is strictly better than 4.5 for me

by perelin4 hours ago|

prev|

[-]

Mirrors my experience as well. Especially the pro-activeness in tool calling sticks out. It goes web searching to augment knowledge gaps on its own way more often.

by galaxyLogic3 hours ago|

prev|

[-]

Do you need to upload your git for it to analyuze it? Or are they reading it off github ?

by gpm1 hours ago|

parent|

[-]

They're probably running it with a claude code like tool and it has a local (to the tool, not to anthropic) copy of the git repo it can query using the cli.