upvote
keep in mind that people who point out a regression and measure the actual #tok, which costs $money, aren't just "being loud" — someone diffed session context usaage and found 4.6 burning >7x the amount of context on a task that 4.5 did in under 2 MB⁣.
reply
It's not that they don't have a point, it's that everyone who's finding 4.6 to be fine or great are not running out to the internet to talk about it.
reply
Being a moderately frequent user of Opus and having spoken to people who use it actively at work for automation, it's a really expensive model to run, I've heard it burn through a company's weekend's credit allocation before Saturday morning, I think using almost an order of magnitude more tokens is a valid consumer concern!

I have yet to hear anyone say "Opus is really good value for money, a real good economic choice for us". It seems that we're trying to retrofit every possible task with SOTA AI that is still severely lacking in solid reasoning, reliability/dependability, so we throw more money at the problem (cough Opus) in the hopes that it will surpass that barrier of trust.

reply
I've also seen Opus 4.6 as a pure upgrade. In particular, it's noticeably better at debugging complex issues and navigating our internal/custom framework.
reply
Same here. 4.6 has been considerably more dilligent for me.
reply
Likewise, I feel like it's degraded in performance a bit over the last couple weeks but that's just vibes. They surely vary thinking tokens based on load on the backend, especially for subscription users.

When my subscription 4.6 is flagging I'll switch over to Corporate API version and run the same prompts and get a noticeably better solution. In the end it's hard to compare nondeterministic systems.

reply
That's very interesting!

Also, +1. Opus 4.6 is strictly better than 4.5 for me

reply
Mirrors my experience as well. Especially the pro-activeness in tool calling sticks out. It goes web searching to augment knowledge gaps on its own way more often.
reply
Do you need to upload your git for it to analyuze it? Or are they reading it off github ?
reply
They're probably running it with a claude code like tool and it has a local (to the tool, not to anthropic) copy of the git repo it can query using the cli.
reply