undefined

upvote

points

by _boffin_19 hours ago |

upvote

by CGamesPlay18 hours ago|

[-]

The announcement elucidated this, and it's IMO worse than this. They don't downgrade to a cheaper model ([edit] for certain classes of offense they suspect you of). They sabotage the model's outputs in other, undisclosed, ways (specifically, "prompt modification, steering vectors, or parameter-efficient fine-tuning"). So, for example, they might load in a steering vector that just forgets the API to PyTorch. But it isn't just "we redirected you to a cheaper model!"

reply

upvote

by buildbot17 hours ago|

[-]

It honestly explains so many issues I have been having, as I used it primarily for ML research (on my personal account, doing things not related to my job I should note). It would literally typo package names and spend huge amounts of time failing to setup simple environments…then do stupid things like set the learning rate to 1e-7, and use the eval set as training data.

reply

upvote

by notrealyme12314 hours ago|

[-]

It burned through all of my tokens in a very short time. I wonder if it their ML mitigations leads to model into deadlocks.

reply

upvote

by peyton16 hours ago|

[-]

That’s insane. I hope they fix it.

reply

upvote

by baq14 hours ago|

[-]

Nothing to fix. This is working as designed.

Using codex for this use case is the fix.

reply

upvote

by sterlind15 hours ago|

[-]

just imagine if they made it sneaky. get things just subtly wrong enough that your training runs just never quite go as well as you think they should.

reply

upvote

by razster14 hours ago|

[-]

This explains why I've been running into some odd roadblocks. Welp that sealed the deal, I'm going to be cancelling our company sub, not worth it.

reply

upvote

by yaur10 hours ago|

[-]

Did my Claude get permanently dumber today because I asked fable to assess my Fairplay integration?

reply

upvote

by 13 hours ago|

[-]

deleted

reply

upvote

by tfirst19 hours ago|

[-]

Their goal is to downgrade people who are violating their TOS, so I think they'd have some argument there. I have no idea how they'll deal with inevitable false positives, especially given how oversensitive most of the other triggers are.

reply

upvote

by dannyw18 hours ago|

[-]

The challenge is the examples they’ve mentioned (distributed training infra? ML acceleration techniques?) go beyond what’s prohibited by their ToS and is like a catch net.

I would wager the majority of ML and data science work in the world aren’t frontier LLM development.

reply

upvote

by weitendorf18 hours ago|

[-]

Yes, this is the problem. They are business interests of Anthropic and have nothing to do with “safety”

reply

upvote

by sudoshred17 hours ago|

[-]

Safety of their IPO

reply

upvote

by Arubis4 hours ago|

[-]

This is how I’m going to read all references to AI safety going forward. Brilliant.

reply

upvote

by MagicMoonlight18 hours ago|

[-]

[dead]

reply

upvote

by AussieWog9312 hours ago|

[-]

To make an analogy: Imagine a patron gets banned from ordering alcohol at a particular establishment, because they got too drunk one time.

It's completely reasonable for the establishment to reject a request for an alcoholic drink, and suggest something alcohol-free instead.

It is not reasonable for them to say "sure, here's your alcoholic drink as you requested" and give them an alcohol-free substitute without telling them.

The fact that the patron broke the rules has nothing to do with it.

reply

upvote

by prmoustache8 hours ago|

[-]

> It is not reasonable for them to say "sure, here's your alcoholic drink as you requested" and give them an alcohol-free substitute without telling them.

Your analogy doesn't work because: - they tell you the rules at the entrance of the bar - they totally tell you when they give you a substitute

The only issue is the bartender asking you for your money before serving you the drink really but again, this is known since day 1 by the customers.

reply

upvote

by staticman27 hours ago|

[-]

Your rebuttle seems to be arguing it's okay for a bartender to simultaneously say:

"This is alcohol"

And

"Or maybe it isn't alcohol."

Or to rephrase it, "They tell you the rules at the entrance, they then tell you they don't follow those rules and they are totally serving alcohol even if they are not."

reply

upvote

by prmoustache2 hours ago|

[-]

No they tell you at the entrance that at any point they may unilaterally decide to replace the alcoholic drink you ordered by a non alcoholic one.

You can decide you are okay with that or not but they aren't dishonest. I wouldn't enter that bar personally but if you do you cannot really complain. It is like complaining because you haven't won at the casino.

reply

upvote

by ZetsuBouKyo16 hours ago|

[-]

It’s just impossible.

Look at real-life stuff like laws, company policies, or school rules. Humans have to enforce them, and we constantly see crazy cases in the news. There’s no way simple rules can ever make speech completely 'safe.' I can't prove it with math or logic yet, but I have a feeling that it’ll never happen. Even humans can't do it.

We can run a simple thought experiment here. Say Case A violates rule B, so we add rule C. Then Case D violates rule B but follows rule C, so we add an exception... and it just goes on and on like that forever. It never ends. In the end, you just get a massive pile of rules that makes it impossible to get anything done.

Ultimately, we will have to face the truth that knowledge is dangerous.

Giving knowledge directly to people who cannot actually understand it and allowing them to just use it blindly can be extremely unsafe.

To use a real-world analogy, the problem we are facing with weak AI right now is just like the debate over gun legalization. Do we want to risk the abuse of guns or knowledge just to protect the freedom to own them?

reply

upvote

by AnthonyMouse14 hours ago|

[-]

> I can't prove it with math or logic yet, but I have a feeling that it’ll never happen.

It's not really that hard to actually prove it with math.

It's a computer, so to produce the boolean result (safe or unsafe) there has to be a mathematical formula. This formula will inherently be extremely complex, but even a very simple formula has a huge problem. Suppose "unsafe" is true if X - Y > 0. Make X and Y themselves as simple or complicated as you like but even in the simplest version it's already impossible to calculate unless the model has perfect information.

You can't calculate "X - Y" if you don't know the value of X. And it's indisputable that there is information it doesn't have. Case in point, telling you about a vulnerability in some piece of code is safe (and indeed not telling you is unsafe) if you're the developer and you want to patch it or an administrator and want to mitigate it, but the opposite if you're the attacker and want to exploit it. The model does not know which one you are, therefore it cannot make the correct determination any more than it can solve one equation with two unknowns.

reply

upvote

by marcus_holmes13 hours ago|

[-]

This is why we have courts and juries. Creating laws that cover all cases and contexts is effectively impossible, so we have humans decide what a fair outcome would be in this specific situation.

reply

upvote

by nativeit13 hours ago|

[-]

Imagine how many tokens Claude would burn waiting for litigation, not to mention letting it reconsider now that it understands the problem completely!

reply

upvote

by vbezhenar7 hours ago|

[-]

Their detection is too aggressive. Just today I'm trying to build a kernel for some SBC and I hit that downgrade. I just asked some things about `make menuconfig` items. I suppose it just flags everything related to linux kernel as cyber attacks.

reply

upvote

by loeg18 hours ago|

[-]

If it's a violation of ToS, just reject instead of silently downgrading.

reply

upvote

by SR2Z18 hours ago|

[-]

But then someone would figure out some prompts that don't trigger this, and Anthropic wouldn't be able to try and disadvantage competitors.

reply

upvote

by BoorishBears16 hours ago|

[-]

Except they openly reject many many other classes of prompts, including extremely high stakes CBRN.

It's only the direction that has direct potential business impact they've decided to sabotage instead of reject.

reply

upvote

by kraakf0618 hours ago|

[-]

[dead]

reply

upvote

by jchw17 hours ago|

[-]

You know, I'm not saying I don't understand what they are doing from a business perspective, but I'm just saying: DeepSeek V4 doesn't silently sabotage you because it thinks you are trying to violate a ToS. Anthropic's clawing back a bit of a moat perhaps, with Fable being an actual improvement of sorts, but now with torching user trust they are really banking on open weight models not catching up to where they are now. I wonder if they have a good reason to believe that they won't, or are hoping for something entirely different to save them.

(P.S. Yes of course I know about model censorship, a different problem, but all of the models are censored to some degree. It happens to be less of a problem for open weight models anyhow, but I figured I'd just preempt this since it's inevitable.)

I actually kinda like DSv4 over Opus 4.7 for some tasks, although I have not figured out what the deciding factor is. (Opus 4.8 so far has not worked very well for me at all, no idea why.)

reply

upvote

by literalAardvark16 hours ago|

[-]

Anthropic seems to me to have consistently been the baddie despite everyone's posturing.

Not that I expect better from openai but at least they're not pretending to be good.

reply

upvote

by thefounder15 hours ago|

[-]

They will give you s*t output, that’s how they deal with it. And say that less than 1% of the requests were affected. Think of this like a kind of shadow ban while you still pay top $.

reply

upvote

by siva714 hours ago|

[-]

I can't trust any output of Claude anymore as silent sabotage explains many things much better now.

reply

upvote

by siva714 hours ago|

[-]

Sabotage is a criminal offense in my jurisdiction, not the legitimate answer to a TOS violation.

reply

upvote

by robrenaud19 hours ago|

[-]

They use a lightweight adapter to silently degrade the performance. Usually these adaptors are made to improve the performance for a given domain/task.

reply

upvote

by garciasn18 hours ago|

[-]

It royally pissed me off today by just continuing with credits without stopping to ask me if I was ok with it.

Ran up $30 in extra charges while it was just flashing on the screen that it was doing that after I walked away to do something while it was humming along.

It has always just told me I ran out of usage and had to wait before. Now? You’re just gonna pay extra because you left it unattended as you’ve done for the last year of use.

reply

upvote

by weird-eye-issue18 hours ago|

[-]

You've already explicitly enabled extra usage in your account settings though, it is not on by default

reply

upvote

by garciasn18 hours ago|

[-]

Unknowingly. Is that set at the org level? Because I never set it and never had it do that before.

reply

upvote

by 15 hours ago|

[-]

deleted

reply

upvote

by throwaway778316 hours ago|

[-]

It is at the org level

reply

upvote

by MillionOClock18 hours ago|

[-]

Do you have Usage credits turned on in your settings?

reply

upvote

by blurbleblurble17 hours ago|

[-]

[dead]

reply

upvote

by golem1412 hours ago|

[-]

If the answer is yes, can you figure out when the switched models by looking at the itemized bill?

reply