undefined

points

[-]

Seconded. After disabling adaptive thinking and using a default higher thinking, I finally got the quality I'm looking for out of Opus 4.6, and I'm pleased with what I see so far in Opus 4.7.

Whatever their internal evals say about adaptive thinking, they're measuring the wrong thing.

by hbbio7 hours ago|

parent|

[-]

Unless they're measuring capex

by JamesSwift7 hours ago|

parent|

[-]

Its even more maddening for me because my whole team is paying direct API pricing for the privilege of this experience! Just charge me the cost and let me tune this thing, sheesh!

by pojzon21 minutes ago|

parent|

[-]

If you get to pay X to YY $$ per each request (because thats the real cost for Anthropic), I strongly believe AI train would suddenly derail.

Currently we are all subsidied by investors money.

How long you can have a business that is only losing money. At some point prices will level up and this will be the end of this escapade.

by manmal2 hours ago|

parent|

prev|

[-]

Why don’t you switch to codex? The grass is greener here. Do use 5.3-codex though, 5.4 is not for coding, despite what many say.

by echelon7 hours ago|

parent|

prev|

[-]

That's why they put the cute animal in your terminal.

by SV_BubbleTime3 hours ago|

parent|

[-]

Ok, side topic… but that little bastard cheerfully told me out of no where that I have a mall of without a null check AND a free inside a conditional that might not get called.

It didn’t give me a line number or file. I had to go investigate. Finally found what it was talking about.

It was wrong. It took me about 20 minutes start to finish.

Turned it off and will not be turning it back on.

by darkwater1 hours ago|

parent|

[-]

I thought it just emitted tongue-in-cheek comments, not serious analysis. And I use the past tense because I had it enable explicitly and a few days ago it disappeared by itself, didn't touch anything.

by ai_slop_hater13 hours ago|

prev|

[-]

This matches my experience as well, "adaptive thinking" chooses to not think when it should.

by andai9 hours ago|

parent|

[-]

I think this might be an unsolved problem. When GPT-5 came out, they had a "router" (classifier?) decide whether to use the thinking model or not.

It was terrible. You could upload 30 pages of financial documents and it would decide "yeah this doesn't require reasoning." They improved it a lot but it still makes mistakes constantly.

I assume something similar is happening in this case.

by solarkraft6 hours ago|

parent|

[-]

I find that GPT 5.4 is okay at it. It does think harder for harder problems and still answers quickly for simpler ones, IME.

by nomel6 hours ago|

parent|

prev|

[-]

Is knowing how hard a problem is, before doing it, solved in humans?

by biglost6 hours ago|

parent|

[-]

Yes, everyweek when assigning fking points to tasks on jira/s

by arthurcolle4 hours ago|

parent|

[-]

As a unit this is funny, Jira points assigned per second (now possible with parallel tool calling AIs)

by WobblyDev1 hours ago|

parent|

prev|

[-]

[dead]

by mochomocha4 hours ago|

parent|

prev|

[-]

It makes me think of this parallel: often in combinatorial optimization ,estimating if it is hard to find a solution to a problem costs you as much as solving it.

With a small bounded compute budget, you're going to sometimes make mistakes with your router/thinking switch. Same with speculative decoding, branch predictors etc.

by ai_slop_hater4 hours ago|

parent|

prev|

[-]

Maybe it is an unsolved problem, but either way I am confused why Anthropic is pushing adaptive thinking so hard, making it the only option on their latest models. To combat how unreliable it is, they set thinking effort to "high" by default in the API. In Claude Code, they now set it to "xhigh" by default. The fact that you cannot even inspect the thinking blocks to try and understand its behavior doesn't help. I know they throw around instructions how to enable thinking blocks, or blocks with thinking summaries, or whatever (I am too confused by now, what it is that they allow us to see), but nothing worked for me so far.

by siva72 hours ago|

parent|

[-]

Because with adaptive thinking they control compute, not you

by rrvsh8 hours ago|

parent|

prev|

[-]

[dead]

by Moonye6662 hours ago|

prev|

[-]

[dead]

by azrollin8 hours ago|

prev|

[-]

[dead]

by whateveracct13 hours ago|

prev|

[-]

you're using a proprietary blackbox

by JamesSwift13 hours ago|

parent|

[-]

Sure, but that blackbox was giving me a lot of value last month.

by mrandish10 hours ago|

parent|

[-]

Me too, but it was obviously wildly unsustainable. I was telling friends at xmas to enjoy all the subsidized and free compute funded by VC dollars while they can because it'll be gone soon.

With the fully-loaded cost of even an entry-level 1st year developer over $100k, coding agents are still a good value if they increase that entry-level dev's net usable output by 10%. Even at >$500/mo it's still cheaper than the health care contribution for that employee. And, as of today, even coding-AI-skeptics agree SoTA coding agents can deliver at least 10% greater productivity on average for an entry-level developer (after some adaptation). If we're talking about Jeff Dean/Sanjay Ghemawat-level coders, then opinions vary wildly.

Even if coding agents didn't burn astronomical amounts of scarce compute, it was always clear the leading companies would stop incinerating capital buying market share and start pushing costs up to capture the majority of the value being delivered. As a recently retired guy, vibe-coding was a fun casual hobby for a few months but now that the VC-funded party is winding down, I'll just move on to the next hobby on the stack. As the costs-to-actual-value double and then double again, it'll be interesting to see how many of the $25/mo and free-tier usage converts to >$2500/yr long-term customers. I suspect some CFO's spreadsheets are over-optimistic regarding conversion/retention ARPU as price-to-value escalates.

by whateveracct13 hours ago|

parent|

prev|

[-]

so it's also a skinner box

by slopinthebag12 hours ago|

parent|

prev|

[-]

Whoops haha. Surely that can't be how black boxes normally work right?

by butlike12 hours ago|

parent|

prev|

[-]

And now it isn't. Pray they don't alter the deal any further.

by retinaros13 hours ago|

parent|

prev|

[-]

its a drug. that is how it works. they ration it before the new stuff. seeing legends of programming shilling it pains me the most. so far there are a few decent non insane public people talking about it :Mitchel Hashimoto, Jeremy Howard, Casei Muratori. hell even DHH drank the coolaid while most of his interviews in the past years was how he went away from AWS and reduced the bill from 3 million to 1millions by basically loosing 9s, resiliency and availability. but it seems he is fine with loosing what makes his business work(programming) to a company that sells Overpowered stack overflow slot machines.

by heurist12 hours ago|

parent|

[-]

I work with some 'legends of programming' and they're all excited about it. I am too, though I am not a legend. It really is changing the game as a valid new technology, and it's not just a 'slot machine'. Anthropic is burning their goodwill though with their lack of QA or intentional silent degradation.

by retinaros11 hours ago|

parent|

[-]

it is a slot machine. you win a lot if what you do is in the dataset. and yes most of enterprise software is likely in it as it is quite basic CRUD API/WebUI. the winning doesnt change the fact that it is a slot machine and you just need one big loss to end your work.

as long as you introduce plans you introduce a push to optimize for cost vs quality. that is what burnt cursor before CC and Codex. They now will be too. Then one day everything will be remote in OAI and Anthropic server. and there won't be a way to tell what is happening behind. Claude Code is already at this level. Showing stuff like "Improvising..." while hiding COT and adding a bunch of features as quick as they can.

by NobleLie8 hours ago|

parent|

prev|

[-]

The question is, are you getting value from your setups or not?

by dyauspitr12 hours ago|

parent|

prev|

[-]

The fact that they might gimp it in the future doesn’t mean it does offer very real world value right now. If you’re not using an LLM to code, you’re basically a dinosaur now. You’re forcing yourself to walk while everyone else is in a vehicle, and a good vehicle at that that gets you to your destination in one piece.

by retinaros11 hours ago|

parent|

[-]

as an overpowered stack overflow machine this is quite good and a huge jump. As a prompt to code generator with yolo mode (the one advertised by those companies) it is alternating between good to trash and every single person that works away from the distribution of the SFT dataset can know this. I understand that this dataset is huge tho and I can see the value in it. I just think in the long term it brings more negatives.

If you vibecode CRUD APIs and react/shadcn UIs then I understand it might look amazing.

by dyauspitr11 hours ago|

parent|

[-]

Yes, definitely CRUDs but also iPhone applications, highly performant financial software (its kdb queries are better than 95% of humans), database structure and querying and embedded systems are other things it’s surprisingly good at. When you take all of those into account there’s very little else left.

by throwaway998012 hours ago|

parent|

prev|

[-]

[flagged]

by bloppe12 hours ago|

parent|

[-]

I think you're loosing your ability to spell

by retinaros12 hours ago|

parent|

prev|

[-]

never said he was a looser. just that his take on genAi coding doesnt align with his previous battles for freedom away from Cloud. OAI and Anthropic have a stronger lock in than any cloud infra company.

you got everything to loose by giving your knowledge and job to closedAI and anthropic.

just look at markets like office suite to understand how the end plays.

by bloppe11 hours ago|

parent|

[-]

Is office suite supposed to be an example of lock-in? I haven't used it since middle school. I've worked at 3 companies and, to the best of my knowledge, not a single person at any of them used office suite. That's not to say we use pen and paper. We just use google docs, or notion, or (my personal favorite) just markdown and possibly LaTeX.

I think it's somewhat analogous with models. Sure, you could bind yourself to a bunch of bespoke features, but that's probably a bad idea. Try to make it as easy as possible for yourself to swap out models and even use open-weight models if you ever need to.

You will get locked into the technology in general, though, just not a particular vendor's product.

by throwaway998012 hours ago|

parent|

prev|

[-]

Those jobs are as good as loost already. There's no endgame where knowledge workers keep knowledge working they way they have been knowledge working. Adapt or be a loosing looser forever.

by jibal9 hours ago|

parent|

prev|

[-]

loser

(Didn't you notice being mocked for the spelling error?)

by chinathrow12 hours ago|

parent|

prev|

[-]

paying for - so some form of return is expected.

by whateveracct12 hours ago|

parent|

[-]

the issue is the return is amorphous and unstructured

there's no contract. you send a bunch of text in (context etc) and it gives you some freeform text out.

by chinathrow12 hours ago|

parent|

[-]

Sure, but I pay real money both to Antrophic and to JetBrains. I get a shitty in line completion full of random garbage or I get correct predictions. I ask Junie (the JetBrains agent) to do a task and it wanders off in a direction I have no idea why I pay for that.

by SyneRyder12 hours ago|

parent|

[-]

> Sure, but I pay real money both to Antrophic...

I misread that as Atrophic. I hope that doesn't catch on...

by gowld12 hours ago|

parent|

prev|

[-]

> I have no idea why I pay for that.

And Claude have no idea why it did that.

by chinathrow12 hours ago|

parent|

[-]

Exactly, and we feel vindicated when it works but sold when it fails. Something will have to change.

by iterateoften13 hours ago|

parent|

prev|

[-]

It’s the official communication that sucks. It’s one thing for the product to be a black box if you can trust the company. But time and time again Boris lies and gaslights about what’s broken, a bug or intentional.

by CodingJeebus13 hours ago|

parent|

[-]

> It’s the official communication that sucks. It’s one thing for the product to be a black box if you can trust the company.

A company providing a black box offering is telling you very clearly not to place too much trust in them because it's harder to nail them down when they shift the implementation from under one's feet. It's one of my biggest gripes about frontier models: you have no verifiable way to know how the models you're using change from day to day because they very intentionally do not want you to know that. The black box is a feature for them.

by bomewish12 hours ago|

parent|

[-]

If you cared so bad you could make your own evals.

by whateveracct12 hours ago|

parent|

[-]

so pay anthropic money to maybe detect when the model is on a down week? lol