Frankly, that sounds excactly like Chat Control and similar recurring attempts to enact total surveillance here in the EU (Now shifted to heavy-handed age verification and various politicians touting bans on VPNs.) I don't want to abandon my continent of birth, though...
hint: they're publicly traded
This goes on to show that - All that interpretability / safety research they are doing can also be weaponized against customers (steering vectors, intent classification, ...) in the name of safety from malicious actors. - If they deem profitable, they might nerf to original model and its training data for ml research at a bulk scale and then they won't even have to announce it so long as the overall benchmark score stays high enough.
As the IPOs get closer, they can do whatever they want to assure the investors that they have a moat that can not be crossed over by their own products. Considering this affects all ML researchers/students at universities, smaller scale research labs, this is just "cutting the branch you are sitting on".
Humans can maintain a long- and medium- term memory of constraints that they consciously (or subconsciously!) apply to the code that they write. The current crop of AIs are all amnesiacs, like the protagonist in Memento, falling back onto general instead of institutional knowledge.
For now, we are safe. We can rent out our meat brains for money for a little while longer.
Next year? Who knows...
You never knew to begin with, now you have an explicit reason to realize this. Any black box run entirely out of your control, where you can never verify the output, is subject to the same suspicion.
“Fool me once, shame on you. Fool me twice, shame on me. Fool me three times, shame on both of us.” -- S. King
Some things are more obscure than others. It's easier to trust and verify Office SaaS than AI SaaS. The determinism and obviousness of most other activities make them less susceptible to hidden interference. AI run by someone else is the next level of black box for users compared to most other objects or services we usually interact with.
But if Anthropic gets their way with regulatory capture, this could be the only future we'll see.
To think that they didn't expect the backlash speaks volumes about how much shady things they're doing which is not publicly known.
Since currently there's no way to verify if poisoning happened or not, I don't trust Anthropic anymore, regardless of what they say.
But my trust towards OAI is also brittle - what if they also do it, or start doing it?
I want to have a verifiable way to know that the prompt I sent was the prompt the model received. I want to know if anything was injected as well - I understand they may not necessarily be able to reveal the exact steering, but at least give me the steering category and its hash or something.
Again, it’s the only refusal I’ve gotten for coding/agentic tasks, and it has a basis in law somewhere, so I don’t fault OpenAI for that.
I suspect this is surprising to folk because they aren’t the ones busy figuring out how to use LLMs for illegal acts.
In general, HN users focus on making stuff, and not the safety side of things, or the scale of harms being enabled via LLMs and generative AI.
If you are on the safety side of things the ratio of misuse to fair use is inverted and everything is at scale.
Transparency won for now, but OpenAI will also have to contend with the long tail of harms LLMs enable, and that’s going to conflict with letting customers have all the features of frontier models.
The correlation between how bad an AI safety risk actually is and how much the companies in question will actually talk about it is almost perfectly negative. The poster child of this is AI superintelligence; companies love to talk about how dangerous the AI they are actively trying to build is. But superintelligence is also a really vague concept without a clear definition. If we naively define it as "an AI system that is better than a human in some aspect", then it already exists. These models already read and write at superhuman speed.
"That's not real superintelligence!" you say. But that's exactly the capability you need in order to flood every online forum with an unending tide of AI slop. And I don't remember, say, OpenAI saying they were shutting down Sora because it was destroying or defacing human culture[1]. They shut down Sora because it was way too expensive to run.
Meanwhile, Sam Altman went and bragged about how he wants ChatGPT to make erotica. Y'know, as if we don't already know that character.ai gooning is about as safe for your mental health as Action Park was for your physical health. But porn is also a huge market, so obviously he and all the other AI companies want in on it, even though the "sexy suicide coach" is already a well-documented harm of AI.
And the idea that distillation is an attack is laughable. Like, I get the logic - if someone can ask the AI to make another AI then they get to change the guardrails - but it's still ultimately just Anthropic objecting to their own conduct when it happens to them. All their models are trained on nonconsensually harvested data. There is no moral or legal principle where Anthropic gets to use my data without permission but I don't get to use theirs.
Furthermore, AI safetyism runs up against "Freedom Zero", a core tenet of the Free Software ethos: you should be allowed to use software in any way you choose. This is not a call for more people using AI for evil, but a call to recognize that people should be allowed to use their property as they wish. Making software disobey its owner is malicious behavior. And every single time safety considerations are brought up it is to justify further attacks on Freedom Zero. And these justifications are always self-serving. There is no context in the world where a frontier AI lab asking someone else's AI about AI research is intrinsically harmful; especially not to the point where we need to make Claude deliberately sabotage your work. That is malware. Anthropic shipped malware. This is inexcusable.
[0] Digital or biological.
Even wide open, uncensored models are often the product of a deliberate choice. I have a hard time faulting people for intentionality (even when they get it wrong).
> Anthropic requires 30 day data retention for Fable and Mythos
https://news.ycombinator.com/item?id=48464258
I used to be able to tell my enterprise customers something simple, that I really believe: "We use Anthropic models via Bedrock/Azure, therefore we are guaranteed that your data will not be used for training models."
That simple blanket statement is no longer true. Also, most normal people/customers only read headlines, and this is a huge story. From my point of view, as someone deploying LLMs in my apps, trust comms with my clients just got set back two years.
You should never use any of the frontier models with operational workloads manipulating or interpreting customer data.
Does that mean the latest model, hosted by the lab, Bedrock, or Azure Foundry? Or, do you mean only use self-hosted models, or what did you mean by that? I would really love to learn what others are doing. I felt like my trust story was solid enough, prior to all this. I have been deploying and integrating Claude and Sonnet (latest 4.x-2), on Azure, as my client base has MS contract trust, for better or worse, and Anthropic models have been making my products amazing.
To see my other thoughts on this cluster f, please see: https://news.ycombinator.com/item?id=48488781
Say you have some flow that is processing/handling regulated, sensitive or other customer data with the LLM as part of an operational process. An example that I'm thinking of is for a customer who wants to more efficiently resolve or route IT incidents to the right place. The incident data may contain user-provided data has strings attached from a compliance perspective.
If you're using a third party API, your T&Cs are the only protection that you have. Microsoft/Google/Amazon are pretty decent by default. When I worked for the government, we had the leverage to extract much favorable terms from the big vendors like Google, Amazon, Microsoft as well. With Anthropic, and OpenAI, they are in the move fast and break things universe, you need to be bringing alot of money to the table to get terms changes, and you can easily stumble into a situation where they are retaining data in a manner that your customer will not like. So unless the customer is informed and accepting of that risk, proceed with caution.
I've had some success using self-hosted inference for these scenarios.
For development of software, totally different story -- it's your IP and you make the risk call.
If you read my rant linked previously, yeah... we are on the same page. As another user pointed out in that thread, the issue here is that even on Bedrock and Azure Foundry, now with Fable 5, Anthropic inserts themselves as an additional data subprocessor that we would have to consider and certainly disclose, correct?
That kind of destroys the whole point of using Bedrock/Azure for the model, doesn't it?
It was definitely sold as “anthropic IP, thorough your old pals at the hyper scaler”. And it’s turning into something else — I’m having lunch with AWS and this other guy showed up with them.
They claim they're not using it for training, only for "safety", and in fact I believe them. If you think they're lying, then why didn't you think they were lying about zero retention before? And "don't throw this in the training bin" is a relatively easy policy for them to get right. Especially because, no matter what your "enterprise leaders" tell themselves, your queries probably have close to zero real training value.
What I don't believe is that they can guarantee it won't leak to non-training parts of Anthropic, leak to or be stolen by outside actors, or be coerced out of them. That risk comes from creating the record in the first place, and that is the problem.
Some pretty audacious hypocrisy from Anthropic this week.
Silent treatment is a breach of trust, what you buy changes depending on the context based on the goals of the producer. It is like your computer silently blocking ads from competitors at the hardware level, which is crazy. I think they erred on the wrong side of things due to IPO pressure.
At least there is competition from multiple companies. Still it is best to have personal benchmarks for the domain you are working on to have a real evaluation of the value you get for the money/time you spent on these products. Without trust, that might be the only way forward to keep the companies honest.
This happens eventually in all sectors, a good magazine/website that does independent product evaluation is priceless. Sadly, the new ad-driven internet decimated those that worked great in the 90/00s. Still there are independent blogs that does some evaluation and that is better than nothing.
I mean, did nobody ever get the vibes, never see a pattern emerging? (well they don't or they wouldn't be so amazed by pattern recognition machines on steroids)
Unilaterally revoking zero-data retention, even for enterprise contracts that explicitly require that? Nope.
Fable is utterly unusable for any kind of security work. I tripped the safeguards yesterday - using Fable to dig into a complex (& annoying) security bug that has so far resisted both human and Opus 4.8 level investigation. "Sorry Dave, I can't let you do that."
For the time being we are requesting Anthropic disable Fable for our enterprise and turn ZDR back on. The two may be interlinked so that one will always get neither or both. ZDR is a contractual obligation. Fable in its current form is useless. Might as well flip the old behaviour on and avoid burning money for no reason while this mess is being sorted out.
For generating the initial 3D simulated safe using three.js it worked well, but then modifications to print a flag tripped the safeguards; eventually got it narrowed down the part in the prompt about it being for a CTF for students, and the "thinking" for the model seems to drift to ideas of encryption/obfuscation of the safe combo so students can't just read out the answer... which makes sense logically to help force students into turning the simulated dial instead. But whatever detection Anthropic I guess just naively sees the model thinking about "encryption" and "obfuscation" without taking into account any of the context.
For writing the dummy firmware, it tripped the safeguards while thinking about how to track dial position in the firmware and output the message; however, when I left out talk about safes and just told it to write firmware for a microcontroller hooked up to an i2c display for showing a message with a beam break sensor to determine the message, and an unspecified i2c chip for getting an unspecified number (e.g. internal wheel positions) it worked fine.
An unrelated software task I asked it to write some code to translate CustomActions in a Windows MSI installer into human readable stuff, which has (exclusively?) defensive security applications for recognizing malicious behavior in an MSI installer. Maybe I'm going crazy, but I'm guessing as part of its research into MSI installer custom actions Fable found articles about analyzing malicious MSI installers, and that probably tripped the safeguards.
Overall my impression is that the safeguards are perhaps using an overzealous and naive implementation that just looks for a list of banned words in the prompt or the thinking -- which drives me crazy when the model says my prompt looks fine, and then 10 minutes in some part of the thinking trips the safeguard.
Unilaterally disabling ZDR seems like a step too far in the enterprise market, even for a company trying to figure out what its users will let it get away with.
Our org has ZDR, and has had it since the contract was signed. Yesterday two things held true at the same time:
1. Fable was available if you had at least .170 CLI client; and
2. ZDR was no longer on
By the time West Coast woke up, the admin panel apparently had an option to toggle ZDR again. It remained off by default.Somewhere along the line we also used the self-service toggle to turn ZDR back on. I am not 100% certain of the exact timeline of interleaving events, many of the actions were taken by our Western US folks. Sorry. It's been a bit hectic over the past ~36h...
To be precise - it makes the "won't work on frontier machine learning" refusal the same as the "won't work on cyber security" refusal (instead of the way it previously would work on frontier machine learning problems but give sub-optimal answers without informing the user)
Of course, it’s impossible to know if that was deliberate sabotage, or model misbehaviour. Which is exactly the problem.
That may be considered malware / a criminal act tbh.