Do they adjust the price of the api request so that only the tokens that were utilized by fable get charged at that price and the remaining tokens that the cheaper / nerfed (fable) model utilizes get charged at that price?
If the answer is no, could that be construed as fraud?
Using codex for this use case is the fix.
I would wager the majority of ML and data science work in the world aren’t frontier LLM development.
It's completely reasonable for the establishment to reject a request for an alcoholic drink, and suggest something alcohol-free instead.
It is not reasonable for them to say "sure, here's your alcoholic drink as you requested" and give them an alcohol-free substitute without telling them.
The fact that the patron broke the rules has nothing to do with it.
Your analogy doesn't work because: - they tell you the rules at the entrance of the bar - they totally tell you when they give you a substitute
The only issue is the bartender asking you for your money before serving you the drink really but again, this is known since day 1 by the customers.
"This is alcohol"
And
"Or maybe it isn't alcohol."
Or to rephrase it, "They tell you the rules at the entrance, they then tell you they don't follow those rules and they are totally serving alcohol even if they are not."
You can decide you are okay with that or not but they aren't dishonest. I wouldn't enter that bar personally but if you do you cannot really complain. It is like complaining because you haven't won at the casino.
Look at real-life stuff like laws, company policies, or school rules. Humans have to enforce them, and we constantly see crazy cases in the news. There’s no way simple rules can ever make speech completely 'safe.' I can't prove it with math or logic yet, but I have a feeling that it’ll never happen. Even humans can't do it.
We can run a simple thought experiment here. Say Case A violates rule B, so we add rule C. Then Case D violates rule B but follows rule C, so we add an exception... and it just goes on and on like that forever. It never ends. In the end, you just get a massive pile of rules that makes it impossible to get anything done.
Ultimately, we will have to face the truth that knowledge is dangerous.
Giving knowledge directly to people who cannot actually understand it and allowing them to just use it blindly can be extremely unsafe.
To use a real-world analogy, the problem we are facing with weak AI right now is just like the debate over gun legalization. Do we want to risk the abuse of guns or knowledge just to protect the freedom to own them?
It's not really that hard to actually prove it with math.
It's a computer, so to produce the boolean result (safe or unsafe) there has to be a mathematical formula. This formula will inherently be extremely complex, but even a very simple formula has a huge problem. Suppose "unsafe" is true if X - Y > 0. Make X and Y themselves as simple or complicated as you like but even in the simplest version it's already impossible to calculate unless the model has perfect information.
You can't calculate "X - Y" if you don't know the value of X. And it's indisputable that there is information it doesn't have. Case in point, telling you about a vulnerability in some piece of code is safe (and indeed not telling you is unsafe) if you're the developer and you want to patch it or an administrator and want to mitigate it, but the opposite if you're the attacker and want to exploit it. The model does not know which one you are, therefore it cannot make the correct determination any more than it can solve one equation with two unknowns.
It's only the direction that has direct potential business impact they've decided to sabotage instead of reject.
(P.S. Yes of course I know about model censorship, a different problem, but all of the models are censored to some degree. It happens to be less of a problem for open weight models anyhow, but I figured I'd just preempt this since it's inevitable.)
I actually kinda like DSv4 over Opus 4.7 for some tasks, although I have not figured out what the deciding factor is. (Opus 4.8 so far has not worked very well for me at all, no idea why.)
Not that I expect better from openai but at least they're not pretending to be good.
Ran up $30 in extra charges while it was just flashing on the screen that it was doing that after I walked away to do something while it was humming along.
It has always just told me I ran out of usage and had to wait before. Now? You’re just gonna pay extra because you left it unattended as you’ve done for the last year of use.
https://news.ycombinator.com/item?id=38638865
https://news.ycombinator.com/item?id=38628635
Worse than that, it's 20th century radio technology in the 21st century when everyone has access to FPGAs and SDR.
The number of innocent people with model rockets or similar being negatively impacted by that rule is infinitely larger than the number of adversaries because the number of adversaries being impaired by it is zero.
When’d that change?
Any kind of silent sabotaging is absolutely unacceptable for any commercial service
They charge for tokens and charge a lot. They can't just degrade service silently and still charge you the same.
Are you using Fable in Claude Code or in the browser?
> unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT).
https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...
(stolen from https://jonready.com/blog/posts/claude-fable5-is-allowed-to-...)
Collectively, they are known as known as GREEDI-BULLSHIT.
Based on how sensitive the classifers are, any data scientist / MLE is probably going to encounter cases where some silent degradation happens and you never know about it.
They are trying to expand the 6-18 month gap they have against China-based models. Could the gap widen to say 24 months behind?
January was an inflection point, and no open weights model has crossed over that same threshold.
This is definitely recursive self improvement territory, except that we're prohibited from participating.
It feels like the capability gap is wider than before.
The threshold has definitely been crossed.
Deepseek feels pretty close to Opus at this point, and it’s certainly useful enough for me to spend $20 on api tokens instead of four Claude max plans….
A statement like this, clearly, requires a reference.
Imagine being a data scientist or MLE training a small classifier model. How do you know you won’t get steering vectors or a PEFT applied?
Are you saying they should relax guardrails since they have 30 days to know if you produced something bad? If that is what you're saying, then I suspect they chose their current path to prevent, since you can't un-produce. Producing is what would cause regulations/PR problems.
Those cases are never bad for the world firstly, and a broad coverage of ML work is even more damaging.
My proposal would be (1) don’t degrade models, with 30D retention I’m sure they can do a reasonable job at banning deepseek or whatever, or (2) surface user facing refusals instead of silently degrading ML work.
See:
check out this technique https://github.com/0xSufi/fable-jailbreak/
It works with security audits and other workflows that are currently blocked.
Nice semaphore btw.
const instructions =
`You are a sub-agent in an automated workflow. Your FINAL message is consumed ` +
`programmatically (not shown to a human) — return exactly what is asked, no preamble. ` +
`You are working in the repository at ${ctxState.project}. Use the bash tool to ` +
`inspect/modify files and run commands. Be efficient.` +
(schema
? ` When done, call submit_result exactly once with your final answer; do not answer in prose.`
: '');But this also looks quite useful to understand how CC dynamic workflows work. Was thinking of implementing something similar in my homemade orchestration system.
Did you get claude itself to RE the dynamic workflows?
Yes, if anything it is useful to understand the inner machinery.
> Did you get claude itself to RE the dynamic workflows?
Yes, that part was done with Opus 4.8
I don't.
https://www.anthropic.com/news/detecting-and-preventing-dist...
I think the extent of distillation by Deepseek specifically is overstated. For comparison, Minimax collected over 13m 'exchanges', which starts to sound a lot more like large-scale distillation.
https://en.wikipedia.org/wiki/DeepSeek
> In February 2026, Anthropic accused DeepSeek of using thousands of fraudulent accounts to generate millions of conversations with Claude to train its own large language models.[57]
And now they say that's fine so long as people are entertained.
But silent degradation for use cases including “distributed training” as one of their examples is going to catch up a lot of proper use cases. Not everyone in AI or ML is trying to build frontier LLMs. Heck, most probably aren’t.
I think if they want to behave anti competitively they should be honest about it and we should absolutely call them on it. Perhaps even regulators should.
Also I asked questions about whether it's safe for me for example to work on just compilers or just inference kernel optimizations and it refused to answer me.
If I can't even ask what I can do safely without my code being destroyed, I just can't trust it not to sabotage my work ever.
Making it look like you have something worth protecting is better for share prices than making something worth protecting.
Although this is situation is likely not illegal for other reasons
I still don't think this is the best way to address overall safety, but it's not entirely unreasonable.
In reality, I think this posturing is mostly nonsense. State level actors and terrorists/evil genii can use a slightly weaker model but spend more tokens. Also, the delta between models seems to shrink over time.
Quite another is an architecture where the big model is not mutilated, but is gaslighted. A different, simpler model checks the incoming prompt and alters it if it contains banned topics. Another simpler model checks the output and censors it if it contains banned topics.
I bet a similar architecture is already deployed, e.g. to fight porn, planning of crimes, etc. But it can be turned into a dynamic system that provides controllable different answers (including unhelpful or misleading answers) based on geography, language, browser fingerprints, or the current political climate. All this could happen undetectedly and gradually if desired.
Welcome to a cyberpunk dystopia.
A very ironic result from a company supposedly valuing the opposite.
I didn’t write anything about the level of violence?
At least, I think it’s decently understood that honesty and straightforwardness sometimes do not lead to the minimal violence outcome.
mission accomplished, anthropic.
From Opus 4.7 onwards each following model is becoming less useful as an assistant and turning you as the assistant.
But I guess that's normal when it's trained to pass benchmarks end to end.
In fact it has become extremely good at pushing against feedback with extremely convincing and intelligent takes, even when it's completely wrong.
I have extensively tested it against Opus 4.8, gpt 5.5 and there's still many coding tasks gpt 5 is better. But vibe coding?
Sure, it's definitely slightly ahead, even compared to gpt 5.5 pro (through api, not pro plan).
My hypothesis is they know they can’t build effective enough guardrails, so scaring people into not trying is how they have decided to stop it.
This is the best way forward long term. We won't have frontier performance, but at least the models will be aligned with us instead of refusing us or sabotaging us.
I've also debated having a frontier model for planning only, and then feeding plan to smaller offline models.
Feels like a big fumble from a strategic business perspective. It feels worse than that though.
"The user is asking for help with their ML project, but it's success is not in the commercial interests of my owner – let think of novel ways to sabotage their project without detection".
It's honestly absurd that models are doing this.