undefined

points

[-]

Mirrors my sentiment. Those tools seem mostly useful for a Google alternative, scaffolding tedious things, code reviewing, and acting as a fancy search.

It seems that they got a grip on the "coding LLM" market and now they're starting to seek actual profit. I predict we'll keep seeing 40%+ more expensive models for a marginal performance gain from now on.

by danny_codes7 hours ago|

parent|

[-]

I just don’t see how they’ll be able to make a profit. Open models have the same performance on coding tasks now. The incentives are all wrong. Why pay more for a model that’s no better and also isn’t open? It’s nonsense

by braebo4 hours ago|

parent|

[-]

Which open model has the same performance as Opus 4.7?

by 3dfd3 hours ago|

parent|

[-]

[dead]

by Bridged77564 hours ago|

parent|

prev|

[-]

I wouldn't say the same but it's pretty close. At this point I'm convinced that they'll continue running the marketing machine and people due to FOMO will keep hopping onto whatever model anthropic releases.

by alex_sf20 minutes ago|

parent|

prev|

[-]

Open models, in actual practice, don't match up to even one or two generation prior models from Anthropic/OpenAI/Google. They've clearly been trained on the benchmarks. Entirely possible it was by mistake, but it's definitely happening.

by 2 hours ago|

parent|

prev|

[-]

deleted

by 3dfd3 hours ago|

parent|

prev|

[-]

[dead]

by djeastm2 hours ago|

parent|

[-]

I think that's precisely why they're paying thousands of people in those other jobs to perform their tasks while collecting new data. Software was easiest because its already mostly written down, but other jobs can be quantized with enough data points. Just give it time

by holoduke5 hours ago|

prev|

[-]

You have to guide an ai. Not let roam freely. If you got skills to guide you can make it output high quality

by xpe8 hours ago|

prev|

[-]

> ... but it seems like Anthropic is going for the Tinder/casino intermittent reinforcement strategy: optimized to keep you spending tokens instead of achieving results.

This part of the above comment strikes me as uncharitable and overconfident. And, to be blunt, presumptuous. To claim to know a company's strategy as an outsider is messy stuff.

My prior: it is 10X to 20X more likely Anthropic has done something other than shift to a short-term squeeze their customers strategy (which I think is only around ~5%)

What do I mean by "something other"? (1) One possibility is they are having capacity and/or infrastructure problems so the model performance is degraded. (2) Another possibility is that they are not as tuned to to what customers want relative to what their engineers want. (3) It is also possible they have slowed down their models down due to safety concerns. To be more specific, they are erring on the side of caution (which would be consistent with their press releases about safety concerns of Mythos). Also, the above three possibilities are not mutually exclusive.

I don't expect us (readers here) to agree on the probabilities down to the ±5% level, but I would think a large chunk of informed and reasonable people can probably converge to something close to ±20%. At the very least, can we agree all of these factors are strong contenders: each covers maybe at least 10% to 30% of the probability space?

How short-sighted, dumb, or back-against-the-wall would Anthropic have to be to shift to a "let's make our new models intentionally _worse_ than our previous ones?" strategy? Think on this. I'm not necessarily "pro" Anthropic. They could lose standing with me over time, for sure. I'm willing to think it through. What would the world have to look like for this to be the case.

There are other factors that push back against claims of a "short-term greedy strategy" argument. Most importantly, they aren't stupid; they know customers care about quality. They are playing a longer game than that.

Yes, I understand that Opus 4.7 is not impressing people or worse. I feel similarly based on my "feels", but I also know I haven't run benchmarks nor have I used it very long.

I think most people viewed Opus 4.6 as a big step forward. People are somewhat conditioned to expect a newer model to be better, and Opus 4.7 doesn't match that expectation. I also know that I've been asking Claude to help me with Bayesian probabilistic modeling techniques that are well outside what I was doing a few weeks ago (detailed research and systems / software development), so it is just as likely that I'm pushing it outside its expertise.

by glerk7 hours ago|

parent|

[-]

> To claim to know a company's strategy as an outsider is messy stuff.

I said "it seems like". Obviously, I have no idea whether this is an intentional strategy or not and it could as well be a side effect of those things that you mentioned.

Models being "worse" is the perceived effect for the end user (subjectively, it seems like the price to achieve the same results on similar tasks with Opus has been steadily increasing). I am claiming that there is no incentive for Anthropic to address this issue because of their business model (maximize the amount of tokens spent and price per token).

by xpe17 minutes ago|

parent|

[-]

>>> ... but it seems like Anthropic is going for the Tinder/casino intermittent reinforcement strategy: optimized to keep you spending tokens instead of achieving results.

>> This part of the above comment strikes me as uncharitable and overconfident. And, to be blunt, presumptuous. To claim to know a company's strategy as an outsider is messy stuff.

> I said "it seems like".

Sorry. I take back the "presumptuous" part. But part of my concern remains: of all the things you chose to wrote, you only mentioned "the Tinder/casino intermittent reinforcement strategy". That phrase is going to draw eyeballs, and you got mine at least. As a reader, it conveys you think it is the most likely explanation. I'm trying to see if there is something there that I'm missing. How likely do you think is? Do you think it is more likely than the other three I mentioned? If so, it seems like your thinking hinges on this:

> I am claiming that there is no incentive for Anthropic to address this issue because of their business model (maximize the amount of tokens spent and price per token).

First, Anthropic is not a typical profit-maximizing entity, it a Public Benefit Corporation [1] [2]. Yes, profits matter still, but there are other factors to consider if we want to accurately predict their actions.

Second, even if profit maximization is the only incentive in play, profit-maximizing entities can plan across different time horizons. Like I mentioned in my above comment, it would be rather myopic to damage their reputation with a strategy that I summarize as a short-term customer-squeeze strategy.

Third, like many people here on HN, I've lived in the Bay Area and I have first-degree connections that give me high confidence (P>80%) that key leaders at Anthropic have motivations that go much beyond mere profit maximization. The AI safety mission is a huge factor. I'm not naive. That mission collides in a complicated way with FU money potential. But I'm confident (P>60%) that a significant number (>25%) of people at Anthropic are implicitly factoring in futures where we all die or lose control due to AI within ~10 to ~20 years -- in which case being filthy rich doesn't matter much.

[1]: https://law.justia.com/codes/delaware/title-8/chapter-1/subc...

[2]: https://time.com/6983420/anthropic-structure-openai-incentiv...

by 3dfd3 hours ago|

prev|

[-]

[dead]