And of course all conversations now have to compact 80 tokens earlier, and are marginally worse (since results get worse the more stuff is in the context)
At some point you just have to accept that llm's, like people, make mistakes, and that's ok!
It's not a niche issue at all. 29 million people in the US are struggling with an eating disorder [1].
> This single paragraph is going to legitimately cost anthropic at least 4, maybe 5 digits.
It's 59 out of 3,791 words total in the system prompt. That's 1.48%. Relax.
It should go without saying, but Anthropic has the usage data; they must be seeing a significant increase in the number of times eating disorders come up in conversations with Claude. I'm sure Anthropic takes what goes into the system prompt very seriously.
[1]: from https://www.southdenvertherapy.com/blog/eating-disorder-stat...
The trajectory is troubling. Eating disorder prevalence has more than doubled globally since 2000, with a 124% increase according to World Health Organization data. The United States has seen similar trends, with hospitalization rates climbing steadily year over year.
> At some point you just have to accept that llm's, like people, make mistakes, and that's ok!
Except that's not the way many everyday users view LLM's. The carwash prompt went viral because it showed the LLM making a blatant mistake, and many seem to have found this genuinely surprising.
So spending $50M to fund a team to weed out "food for crazies" becomes a no-brainer.
Yes, the companies providing these products are sued a lot and are heavily regulated, too.
We let people buy kitchen knives. But because the kitchen knife companies don't have billions of dollars, we don't go after them.
We go after the LLM that might have given someone bad diet advice or made them feel sad.
Nevermind the huge marketing budget spent on making people feel inadequate, ugly, old, etc. That does way more harm than tricking an LLM into telling you you can cook with glue.
Because it's a waste of my money to check whether my Object Pascal compiler doesn't develop eating disorders, on every turn.
It's a particularly sensitive issue so they are just probably being cautious.
This era of locked hyperscaler dominance needs to end.
If a third tier LLM company made their weights available and they were within 80% of Opus, and they forced you to use their platform to deploy or license if you ran elsewhere, I'd be fine with that. As long as you can access and download the full raw weights and lobotomize as you see fit.
They don’t reliably have the judgment to pause and proceed carefully if a delicate topic comes up. Hence these bandaids in the system prompt.
Letting the system improve over time is fine. System prompt is an inefficient place to do it, buts it's just a patch until the model can be updated.