undefined

upvote

points

by walthamstow13 hours ago |

upvote

by embedding-shape12 hours ago|

[-]

Even better, adding it to the system prompt is a temporary fix, then they'll work it into post-training, so next model release will probably remove it from the system prompt. At least when it's in the system prompt we get some visibility into what's being censored, once it's in the model it'll be a lot harder to understand why "How many calories does 100g of Pasta have?" only returns "Sorry, I cannot divulge that information".

reply

upvote

by gchamonlive12 hours ago|

[-]

Just assume each model iteration incorporates all the censorship prompts before and compile the possible list from the system prompt history. To validate it, design an adversary test against the items in the compiled list.

reply

upvote

by jeffrwells5 hours ago|

[-]

Another way to think about it: every single user of Claude is paying an extra tax in every single request

reply

upvote

by zythyx40 minutes ago|

[-]

Isn't it basically the same as paying dust to crypto exchanges when making a transaction - it's so miniscule that it's not worth caring about?

reply

upvote

by teaearlgraycold3 hours ago|

[-]

Well the system prompt is probably permanently cached.

reply

upvote

by dymk2 hours ago|

[-]

Takes up a portion of the context window, though

reply

upvote

by whateveracct1 hours ago|

[-]

And the beginning of the context window gets more attention, right?

reply

upvote

by wongarsu1 hours ago|

[-]

On API pricing you still pay 10% of the input token price on cache reads. Not sure if the subscription limits count this though.

And of course all conversations now have to compact 80 tokens earlier, and are marginally worse (since results get worse the more stuff is in the context)

reply

upvote

by zozbot2344 hours ago|

[-]

That part of the system prompt is just stating that telling someone who has an actual eating disorder to start counting calories or micro-manage their eating in other ways (a suggestion that the model might well give to an average person for the sake of clear argument, which would then be understood sensibly and taken with a grain of salt) is likely to make them worse off, not better off. This seems like a common-sense addition. It should not trigger any excess refusals on its own.

reply

upvote

by MoltenMan3 hours ago|

[-]

The problem is that this is an incredibly niche / small issue (i.e. <<1% of users, let alone prompts, need this clarification), and if you add a section for every single small thing like this, you end up with a massively bloated prompt. Notice that every single user of Claude is paying for this paragraph now! This single paragraph is going to legitimately cost anthropic at least 4, maybe 5 digits.

At some point you just have to accept that llm's, like people, make mistakes, and that's ok!

reply

upvote

by alwillis2 hours ago|

[-]

>The problem is that this is an incredibly niche / small issue (i.e. <<1% of users, let alone prompts

It's not a niche issue at all. 29 million people in the US are struggling with an eating disorder [1].

> This single paragraph is going to legitimately cost anthropic at least 4, maybe 5 digits.

It's 59 out of 3,791 words total in the system prompt. That's 1.48%. Relax.

It should go without saying, but Anthropic has the usage data; they must be seeing a significant increase in the number of times eating disorders come up in conversations with Claude. I'm sure Anthropic takes what goes into the system prompt very seriously.

[1]: from https://www.southdenvertherapy.com/blog/eating-disorder-stat...

The trajectory is troubling. Eating disorder prevalence has more than doubled globally since 2000, with a 124% increase according to World Health Organization data. The United States has seen similar trends, with hospitalization rates climbing steadily year over year.

reply

upvote

by zozbot2343 hours ago|

[-]

It's not "incredibly niche" when you consider the kinds of questions that average everyday users might submit to these AIs. Diet is definitely up there, given how unintuitive it is for many.

> At some point you just have to accept that llm's, like people, make mistakes, and that's ok!

Except that's not the way many everyday users view LLM's. The carwash prompt went viral because it showed the LLM making a blatant mistake, and many seem to have found this genuinely surprising.

reply

upvote

by mudkipdev2 hours ago|

[-]

The Claude prompt is already quite bloated, around 7,000 tokens excluding tools.

reply

upvote

by layer83 hours ago|

[-]

If it’s common sense, shouldn’t the model know it already?

reply

upvote

by zozbot2343 hours ago|

[-]

Shouldn't the model "know" that if I have to wash my car at the carwash, I can't just go there on foot? It's not that simple!

reply

upvote

by WarmWash12 hours ago|

[-]

When you are worth hundreds of billions, people start falling over themselves running to file lawsuits against you. We're already seeing this happen.

So spending $50M to fund a team to weed out "food for crazies" becomes a no-brainer.

reply

upvote

by goosejuice4 hours ago|

[-]

It is a no brainer. If a company of any size is putting out a product that caused cancer we wouldn't think twice about suing them. Why should mental health disorders be any different?

reply

upvote

by bojan4 hours ago|

[-]

There are many, many companies out there putting out products that cause cancer. Think about alcohol, tobacco, internal combustion engines, just to name a few most obvious examples.

reply

upvote

by fineIllregister3 hours ago|

[-]

> alcohol, tobacco, internal combustion engine

Yes, the companies providing these products are sued a lot and are heavily regulated, too.

reply

upvote

by ChadNauseam2 hours ago|

[-]

If you get cancer from drinking alcohol, smoking cigarettes or breathing particles emitted by ICE engines in their standard course of operation, you generally can't sue the manufacturer.

reply

upvote

by WarmWash29 minutes ago|

[-]

I think a more apt analogy would be suing a vaccine manufacturer after it gave you adverse effects, when you also knew you were high risk before that.

reply

upvote

by arcanemachiner3 hours ago|

[-]

Why stop there? We could jam up the system prompt with all kinds of irrelevant guardrails to prevent harm to groups X, Y, and Z!

reply

upvote

by echelon5 hours ago|

[-]

It's so shameful.

We let people buy kitchen knives. But because the kitchen knife companies don't have billions of dollars, we don't go after them.

We go after the LLM that might have given someone bad diet advice or made them feel sad.

Nevermind the huge marketing budget spent on making people feel inadequate, ugly, old, etc. That does way more harm than tricking an LLM into telling you you can cook with glue.

reply

upvote

by gmac5 hours ago|

[-]

I don’t feel like that’s a reasonable analogy. Kitchen knives don’t purport to give advice. But if a kitchen knife came with a label that said ‘ideal for murdering people’, I expect people would go after the manufacturer.

reply

upvote

by mattjoyce5 hours ago|

[-]

Ad companies prompt injecting consumers. LLM companies countering with guardrails.

reply

upvote

by ikari_pl3 hours ago|

[-]

Are the prompts used both by the desktop app, like typical chatbot interfaces, and Claude Code?

Because it's a waste of my money to check whether my Object Pascal compiler doesn't develop eating disorders, on every turn.

reply

upvote

by seba_dos14 hours ago|

[-]

It feels like half of AI research is math, and the other half is coming up with yet another way to state "please don't do bad things" in the prompt that will sure work this time I promise.

reply

upvote

by rzmmm11 hours ago|

[-]

The alignment favors supporting healthy behaviors so it can be a thin line. I see the system prompt as "plan B" when they can't achieve good results in the training itself.

It's a particularly sensitive issue so they are just probably being cautious.

reply

upvote

by echelon5 hours ago|

[-]

I want a hyperscaler LLM I can fine tune and neuter. Not a platform or product. Raw weights hooked up to pure tools.

This era of locked hyperscaler dominance needs to end.

If a third tier LLM company made their weights available and they were within 80% of Opus, and they forced you to use their platform to deploy or license if you ran elsewhere, I'd be fine with that. As long as you can access and download the full raw weights and lobotomize as you see fit.

reply

upvote

by mohamedkoubaa3 hours ago|

[-]

Starting to feel like a "we were promised flying cars but all we got" kind of moment

reply

upvote

by 3 hours ago|

[-]

deleted

reply

upvote

by l5870uoo9y4 hours ago|

[-]

Could be that Claude has particular controversial opinions on eating disorders.

reply

upvote

by rcfox1 hours ago|

[-]

There are communities of people who publicly blog about their eating disorders. I wouldn't be surprised if the laymen's discourse is over-represented in the LLM's training data compared to the scientific papers.

reply

upvote

by dwaltrip4 hours ago|

[-]

LLMs have been trained to eagerly answer a user’s query.

They don’t reliably have the judgment to pause and proceed carefully if a delicate topic comes up. Hence these bandaids in the system prompt.

reply

upvote

by newZWhoDis3 hours ago|

[-]

>the year is 2028 >5M of your 10M context window is the system prompt

reply

upvote

by ls6124 hours ago|

[-]

Yup. Anyone who is surprised by this has not been paying attention to the centralization of power on the internet in the past 10 years.

reply

upvote

by felixgallo12 hours ago|

[-]

I mean, that's what humans have always done with our morals, ethics, and laws, so what alternative improvement do you have to make here?

reply

upvote

by 10 hours ago|

[-]

deleted

reply

upvote

by idiotsecant12 hours ago|

[-]

Imagine the kind of human that never adapts their moral standpoints. Ever. They believe what they believed when they were 12 years old.

Letting the system improve over time is fine. System prompt is an inefficient place to do it, buts it's just a patch until the model can be updated.

reply