upvote
I work on open source text-to-image finetuning of open source models like zimage/flux2 klein 4b and inference time latency optimization. The moment I read the silent treatment, I went ahead and cancelled my subscription too since I would never know whether the models they launch will silently corrupt my output. This is totally unacceptable. There is a big difference between silent / flagged if you are doing ml research but not at frontier capability.

This goes on to show that - All that interpretability / safety research they are doing can also be weaponized against customers (steering vectors, intent classification, ...) in the name of safety from malicious actors. - If they deem profitable, they might nerf to original model and its training data for ml research at a bulk scale and then they won't even have to announce it so long as the overall benchmark score stays high enough.

As the IPOs get closer, they can do whatever they want to assure the investors that they have a moat that can not be crossed over by their own products. Considering this affects all ML researchers/students at universities, smaller scale research labs, this is just "cutting the branch you are sitting on".

reply
I think all this started with post opus 4.5, that's when claude started wrecking my shit without extreme oversight. Codebases it was making positive contributions to before were slowly and constantly being eroded and wrecked. Give it tasks in isolation? still does well, but the moment it sees the bigger picture, it goes to shit. I chalked it up to a bad model but this makes it all seem like it may have been by design in retrospect.
reply
Constraint decay is an issue with all LLM-based agentic development, at least for now.

Humans can maintain a long- and medium- term memory of constraints that they consciously (or subconsciously!) apply to the code that they write. The current crop of AIs are all amnesiacs, like the protagonist in Memento, falling back onto general instead of institutional knowledge.

For now, we are safe. We can rent out our meat brains for money for a little while longer.

Next year? Who knows...

reply
> I would never know whether the models they launch will silently corrupt my output

You never knew to begin with, now you have an explicit reason to realize this. Any black box run entirely out of your control, where you can never verify the output, is subject to the same suspicion.

reply
True enough, but that is true for all the products I buy. I do not expect to control every product I own. For some I prefer to have more control, for others I just need something that works out of the box. There is always an initial bias for trust when you buy something otherwise you would not spend your hard earned money on it.

“Fool me once, shame on you. Fool me twice, shame on me. Fool me three times, shame on both of us.” -- S. King

reply
> but that is true for all the products I buy

Some things are more obscure than others. It's easier to trust and verify Office SaaS than AI SaaS. The determinism and obviousness of most other activities make them less susceptible to hidden interference. AI run by someone else is the next level of black box for users compared to most other objects or services we usually interact with.

reply
OpenAI has a real opportunity to do some sort of "we don't maliciously alter your prompt and nerf the model" with some form of verification, when they release the next model.

But if Anthropic gets their way with regulatory capture, this could be the only future we'll see.

To think that they didn't expect the backlash speaks volumes about how much shady things they're doing which is not publicly known.

reply
OpenAI has been the absolute worst about this, historically. I found myself having to change my queries because it refused to serve things it deemed insensitive.
reply
Yes, that's true. Excluding Fable, OAI models are the most refusal heavy. However, I'd rather get a refusal than response with poisoned output.

Since currently there's no way to verify if poisoning happened or not, I don't trust Anthropic anymore, regardless of what they say.

But my trust towards OAI is also brittle - what if they also do it, or start doing it?

I want to have a verifiable way to know that the prompt I sent was the prompt the model received. I want to know if anything was injected as well - I understand they may not necessarily be able to reveal the exact steering, but at least give me the steering category and its hash or something.

reply
What kind of work are you getting refusals on? Genuinely curious. The only refusal I’ve had in recent memory was declining to find doorbell camera footage matching a certain description, which is fair enough and I think EU laws heavily restrict such activities (even tho I’m not in the EU)
reply
During Iran shutdowns I've been researching what ways Iranians manage to get to the internet by mimicking as whitelisted resources (such as hcapcha). ChatGPT had refused to lookup information written in Farsi since "circumventing state regulation is a crime".
reply
How would the AI be able to find the footage itself?
reply
I use Codex and wanted it to sort through the footage and use subagents to review. Codex limits are fairly generous, esp paired with mini models for this kind of task generally, but even GPT5.5 usage is still pretty generous.

Again, it’s the only refusal I’ve gotten for coding/agentic tasks, and it has a basis in law somewhere, so I don’t fault OpenAI for that.

reply
Eh, I expect open Ai to follow suit.

I suspect this is surprising to folk because they aren’t the ones busy figuring out how to use LLMs for illegal acts.

In general, HN users focus on making stuff, and not the safety side of things, or the scale of harms being enabled via LLMs and generative AI.

If you are on the safety side of things the ratio of misuse to fair use is inverted and everything is at scale.

Transparency won for now, but OpenAI will also have to contend with the long tail of harms LLMs enable, and that’s going to conflict with letting customers have all the features of frontier models.

reply
Building distributed training pipelines or optimising your ML stack (examples called out in the model card) isn’t harmful.
reply
Yes, but there is a very specific subset of things AI companies will and won't cite safety for as a concern, and that subset intersects neatly with things the companies consider to be business risks. Like, the main reason why AI companies are so willing to poison the well is because there's no money in selling to the kinds of people who want to write malware[0].

The correlation between how bad an AI safety risk actually is and how much the companies in question will actually talk about it is almost perfectly negative. The poster child of this is AI superintelligence; companies love to talk about how dangerous the AI they are actively trying to build is. But superintelligence is also a really vague concept without a clear definition. If we naively define it as "an AI system that is better than a human in some aspect", then it already exists. These models already read and write at superhuman speed.

"That's not real superintelligence!" you say. But that's exactly the capability you need in order to flood every online forum with an unending tide of AI slop. And I don't remember, say, OpenAI saying they were shutting down Sora because it was destroying or defacing human culture[1]. They shut down Sora because it was way too expensive to run.

Meanwhile, Sam Altman went and bragged about how he wants ChatGPT to make erotica. Y'know, as if we don't already know that character.ai gooning is about as safe for your mental health as Action Park was for your physical health. But porn is also a huge market, so obviously he and all the other AI companies want in on it, even though the "sexy suicide coach" is already a well-documented harm of AI.

And the idea that distillation is an attack is laughable. Like, I get the logic - if someone can ask the AI to make another AI then they get to change the guardrails - but it's still ultimately just Anthropic objecting to their own conduct when it happens to them. All their models are trained on nonconsensually harvested data. There is no moral or legal principle where Anthropic gets to use my data without permission but I don't get to use theirs.

Furthermore, AI safetyism runs up against "Freedom Zero", a core tenet of the Free Software ethos: you should be allowed to use software in any way you choose. This is not a call for more people using AI for evil, but a call to recognize that people should be allowed to use their property as they wish. Making software disobey its owner is malicious behavior. And every single time safety considerations are brought up it is to justify further attacks on Freedom Zero. And these justifications are always self-serving. There is no context in the world where a frontier AI lab asking someone else's AI about AI research is intrinsically harmful; especially not to the point where we need to make Claude deliberately sabotage your work. That is malware. Anthropic shipped malware. This is inexcusable.

[0] Digital or biological.

[1] https://www.youtube.com/watch?v=YCPAIg7RUq8

reply
I cancelled mine immediately too. Anyone who supports open models will sympathize.
reply
that you still had max after all their deceptions is amazing
reply
Yeah; not my smartest decision given their ongoing “issues”
reply
You've been Stuxnet-ed by Anthropic :)
reply