undefined

points

[-]

These threads are always full of superstitious nonsense. Had a bad week at the AIs? Someone at Anthropic must have nerfed the model!

The roulette wheel isn't rigged, sometimes you're just unlucky. Try another spin, maybe you'll do better. Or just write your own code.

by 2001zhaozhao13 hours ago|

parent|

[-]

Start vibe-coding -> the model does wonders -> the codebase grows with low code quality -> the spaghetti code builds up to the point where the model stops working -> attempts to fix the codebase with AI actually make it worse -> complain online "model is nerfed"

by NewsaHackO12 hours ago|

parent|

[-]

I remember there was a guy that had three(!) Claude Max subscriptions, and said he was reducing his subscriptions to one because of some superfluous problem. I'm thinking, nah, you are clearly already addicted to the LLM slot machine, and I doubt you will be able to code independently from agent use at this point. Antropic, has already won in your case.

by teaearlgraycold10 hours ago|

parent|

[-]

I don’t really understand the slot machine, addiction, dopamine meme with LLM coding. Yeah it’s nice when a tool saves you time. Are people addicted to CNCs, table saws, and 3D printers?

by NewsaHackO9 hours ago|

parent|

[-]

I don't use the agentic workflow (as I am using it for my own personal projects), but if you have ever used it, there is this rush when it solves a problem that you have been struggling with for some time, especially if it gives a solution in an approach you never even considered that it has baked in its knowledge base. It's like an "Eureka" moment. Of course, as you use it more and more, you start to get better at recognizing "Eureka" moments and hallucinations, but I can definitely see how some people keep chasing that rush/feeling you get when it uses 5 minutes to solve a problem that would have taken you ages to do (if at all).

Also, another difference is the stochastic nature of the LLMs. With table saws, CNC machines, and modern 3D printers, you kind of know what you are getting out. With LLMs, there is a whole chance aspect; sometimes, what it spits out is plainly incorrect, sometimes, it is exactly what you are thinking, but when you hit the jackpot, and get the nugget of info that elegantly solves the problem, you get the rush. Then, you start the whole bikeshedding of your prompt/models/parameters to try and hit the jackpot again.

by fumar4 hours ago|

parent|

[-]

It is the rush of "wow it solved this." I should take a break and work on something else, but in the back of my mind "what else can it solve?" Then I come up with extra work and sometimes lose at the LLM casino.

by YZF3 hours ago|

parent|

prev|

[-]

It's fun and you do get a dopamine rush when LLM does something cool for you. I'm certainly feeling it as a user. Perhaps you can get the same from other tools. I would vote for yes- addictive.

But it's also a tool that (can) save(s) you time.

by kakacik9 hours ago|

parent|

prev|

[-]

The dopamine rush to fix the issue super quickly, close the ticket, slack / work more?

Absolutely, not understanding why you even ask. Humans are creatures of habits that often dip a bit or more into outright addictions, in one of its many forms.

by wheatbond10 hours ago|

parent|

prev|

[-]

Yes

by unshavedyak14 hours ago|

parent|

prev|

[-]

Part of me wonders if there's some subtle behavioral change with it too. Early on we're distrusting of a model and so we're blown away, we were giving it more details to compensate for assumed inability, but the model outperformed our expectations. Weeks later we're more aligned with its capabilities and so we become lazy. The model is very good, why do we have to put in as much work to provide specifics, specs, ACs, etc. So then of course the quality slides because we assumed it's capabilities somehow absolved the need for the same detailed guardrails (spec, ACs, etc) for the LLM.

This scenario obviously does not apply to folks who run their own benches with the same inputs between models. I'm just discussing a possible and unintentional human behavioral bias.

Even if this isn't the root cause, humans are really bad at perceiving reality. Like, really really bad. LLMs are also really difficult to objectively measure. I'm sure the coupling of these two facts play a part, possibly significant, in our perception of LLM quality over time.

by mewpmewp213 hours ago|

parent|

[-]

Still I don't previously remember Claude constantly trying to stop conversations or work, as in "something is too much to do", "that's enough for this session, let's leave rest to tomorrow", "goodbye", etc. It's almost impossible to get it do refactoring or anything like that, it's always "too massive", etc.

by darkteflon20 minutes ago|

parent|

[-]

I keep reading about this, but I have never, ever seen it. Daily Claude Max user for ~6 months. Not saying it doesn’t happen, but it’s never once happened to me.

by OccamsMirror2 hours ago|

parent|

prev|

[-]

Not to mention the amount of placeholders and TODOs it's leaving in the codebase but then declaring that it's finished the work.

I've cancelled my subscriptions to both Codex and Claude and am going to go back to writing my own code.

When the merry-go-round of cheap high quality inference truly ends, I don't want to be caught out.

by egeozcan3 hours ago|

parent|

prev|

[-]

Even superpowers started dividing things into "phases".

"I think we can postpone this to phase 2 and start with the basics".

Meanwhile using more tokens to make a silly plan to divide tasks among those phases, complicated analysis of dependency chains, deliverables, all that jazz. All unprompted.

by colordrops2 hours ago|

parent|

prev|

[-]

I thought I was tripping when I saw this. Must have been a measure to reduce usage to save them some compute.

by youoy11 hours ago|

parent|

prev|

[-]

100% agree, and I experienced that behaviour first hand. I got confident, started giving less guidelines, and suddenly two weeks have passed and the LLM put me into a state of horrible code that looks good superficially because I trusted it too much.

by delbronski15 hours ago|

parent|

prev|

[-]

Nah dude, that roulette wheel is 100% rigged. From top to bottom. No doubt about that. If you think they are playing fair you are either brand new to this industry, or a masochist.

by andai9 hours ago|

parent|

prev|

[-]

They don't nerf the model, just lower the default reasoning effort, encourage shorter responses in the system prompt, etc. Totally different ;)

by theptip6 hours ago|

parent|

prev|

[-]

I normally agree with this, but they objectively did lower the default effort level, and this caused people to get worse performance unexpectedly.

And it does seem likely to me that there were intermittent bugs in adaptive reasoning, based on posts here by Boris.

So all told, in this case it seems correct to say that Opus has been very flaky in its reasoning performance.

I think both of these changes were good faith and in isolation reasonable, ie most users don’t need high effort reasoning. But for the users that do need high effort, they really notice the difference.

by 11 hours ago|

parent|

prev|

[-]

deleted

by portly11 hours ago|

parent|

prev|

[-]

Good to remind this. But I also don't want to go back to pre-llm. Some dev activities are just too painful and boring, like correctly writing s3 policies. We must have discipline to decide what is worth our attention and what we should automate, because there is only so much mind energy we can spend each day.

by lnenad13 hours ago|

parent|

prev|

[-]

I mean they literally said on their own end that adaptive thinking isn't working as it should. They rolled it out silently, enabled by default, and haven't rolled it back.

by awwaiid12 hours ago|

parent|

prev|

[-]

It's also difficult to recognize that when it got it right THAT might have been the lucky week.

by colordrops9 hours ago|

parent|

prev|

[-]

Sorry but this is a ridiculous comment. It's not magic. There are countless levers that can be changed and ARE changed to affect quality and cost, and it's known that compute is scarce.

We aren't superstitious, you are just ignorant.

by dakolli13 hours ago|

parent|

prev|

[-]

Its because llm companies are literally building quasi slot machines, their UI interfaces support this notion, for instance you can run a multiplier on your output x3,x4,5, Like a slot machine. Brain fried llm users are behaving like gamblers more and more everyday (its working). They have all sorts of theories why one model is better than another, like a gambler does about a certain blackjack table or slot machine, it makes sense in their head but makes no sense on paper.

Don't use these technologies if you can't recognize this, like a person shouldn't gamble unless they understand concretely the house has a statistical edge and you will lose if you play long enough. You will lose if you play with llms long enough too, they are also statistical machines like casino games.

This stuff is bad for your brain for a lot of people, if not all.

by nextaccountic12 hours ago|

parent|

[-]

I agree with the notion, except that the models are indeed different

Some day maybe they will converge into approximately the same thing but then training will stop making economic sense (why spend millions to have ~the same thing?)

by leptons12 hours ago|

parent|

prev|

[-]

100% agree with this take. As I find myself using AI to write software, it is looking like gambling. And it isn't helping stimulate my brain in ways that actually writing code does. I feel like my brain is starting to atrophy. I learn so much by coding things myself, and everything I learn makes me stronger. That doesn't happen with AI. Sure I skim through what the AI produced, but not enough to really learn from it. And the next time I need to do something similar, the AI will be doing it anyway. I'm not sure I like this rabbit hole we're all going down. I suspect it doesn't lead to good things.

by dakolli7 hours ago|

parent|

[-]

It a terrifying path we're taking, everyone's competency is going to be 1:1 correlated to the quality and quantity of tokens they can afford (or be loaned).. I prefer to build by hand, I also don't think its that much slower to do by hand, and much rewarding... Sure you can be faster if you're building slop landing pages for your hypothetical SaaS you'll never finish but why would I want to build those things.

by leptons2 hours ago|

parent|

[-]

It's not slower to do by hand. I race the AI all the time. I give it a simple task to write a small script that I need to complete a task that is blocking me... and the "thinking" thing spins and spins. So I often just fire up a code editor and write it myself, often before the AI is actually done after I have to cajole it through 10 iterations to get what I want. And when I race it, I get what I want every time, and often in the same or less time than it takes the AI (plus the time that I have to spend cajoling it).

by SkyPuncher9 hours ago|

prev|

[-]

I agree.

I have flexibility to shift my core working hours (and what I do during N/A business hours). Knowing they're explicitly making it dumb because of load is important. It allows me to shuffle my work around and run heavy workloads late at night (plan during working hours then come click "yes" a few times in the evening).

by sobellian12 hours ago|

prev|

[-]

This, plus the alchemical nature of these tools, seems to have made users pretty paranoid (I admit I am also guilty of paranoia). Maybe there's room for a Standard AI - we may change the prices based on market conditions, but we always give you exactly the model you ask for.

by drewnick15 hours ago|

prev|

[-]

Hasn't Opus 4.5 been famously consistent while 4.6 was floating all over the place?

by JohnMakin11 hours ago|

parent|

[-]

I'm still on 4.5. My coworkers are describing a lot of problems I just don't have. I suspect it was some combination of the larger context window, the model itself, and various bugs like the cache miss thing reported a little while ago.

by YZF3 hours ago|

parent|

prev|

[-]

For me 4.6 has been a noticeable leap in performance from 4.5. I'm not missing 4.5 at all.

by stasomatic14 hours ago|

prev|

[-]

I am a neophyte regarding pros and cons of each model. I am learning the ropes, writing shell scripts, a tiny Mac app, things like that.

Reading about all the “rage switching”, isn’t it prudent to use a model broker like GH Copilot with your own harness or something like oh-my-pi? The frontier guys one up each other monthly, it’s really tiring. I get that large corps may have contracts in place, but for an in indie?

by teling14 hours ago|

prev|

[-]

Good shout. Wish they were more transparent about these 3 things.

by Barbing10 hours ago|

prev|

[-]

This is why we took business ethics & I know Dario had to too

How will your project/decision look on the front page of the Wall Street Journal? Well when a whistleblower reveals what everyone knows ($9b->$30b rev jump w/o servers growing on trees simultaneously = tough decisions), it's gonna be public anyway.

by kulikalov15 hours ago|

prev|

[-]

Or it could be a selection bias. The ground truth is not what HN herd mentality complains about, but the usage stats.

by lanyard-textile15 hours ago|

parent|

[-]

I suppose I come forward with my own usage stats, but it is anecdata :)

And the andecdata matches other anecdata.

Maybe I'm missing why that's selection bias.

by preommr12 hours ago|

prev|

[-]

> This comment thread is a good learner for founders;

lmao, no they shouldn't.

Public sentiment, especially on reactionary mediums like social media should be taken with a huge grain of salt. I've seen overwhelming negativity for products/companies, only for it it completely dissapear, or be entirely wrong.

It's like that meme showing members of a steam group that are boycotting some CoD game, and you can see that a bunch of them were playing in-game of the very thing they forsook.

People are fickle, and their words cheap.

by lanyard-textile11 hours ago|

parent|

[-]

The internet is a stupid place with people who can't make up their mind, I don't disagree :)

But this isn't like a minor debacle about a brand. The flagship product had a severe degradation, and the parent company won't be forthcoming about it.

It's short term thinking. Congratulations, everyone still uses your product for now, but it diluted your brand.

Why take the risk when the alternative is so incredibly easily? Build engagement with your users and enjoy your loyal army.