undefined

points

[-]

> And it gets churned by every single request they receive.

Not true, it gets calculated once and essentially baked into initial state basically and gets stored in a standard K/V prefix cache. Processing only happens on new input (minus attention which will have to content with tokens from the prompt)

by an0malous13 hours ago|

prev|

[-]

I’m just surprised this works at all. When I was building AI automations for a startup in January, even 1,000 word system prompts would cause the model to start losing track of some of the rules. You could even have something simple like “never do X” and it would still sometimes do X.

by embedding-shape12 hours ago|

parent|

[-]

Two things; the model and runtime matters a lot, smaller/quantized models are basically useless at strict instruction following, compared to SOTA models. The second thing is that "never do X" doesn't work that well, if you want it to "never do X" you need to adjust the harness and/or steer it with "positive prompting" instead. Don't do "Never use uppercase" but instead do "Always use lowercase only", as a silly example, you'll get a lot better results. If you've trained dogs ("positive reinforcement training") before, this will come easier to you.

by jug2 hours ago|

parent|

[-]

It's interesting to note here that Anthropic indeed don't use "do not X" in the Opus system prompts. However, "Claude does not X" is very common.

by wongarsu1 hours ago|

parent|

[-]

I suspect that lets the model "roleplay" as Claude, promoting reasoning like "would Claude do X?" or "what would Claude do in this situation?"

by dataviz100012 hours ago|

parent|

prev|

[-]

I created a test evaluation (they friggen' stole the word harness) that runs a changed prompt comparing success pass / fail, the number of tokens and time of any change. It is an easy thing to do. The best part is I set up an orchestration pattern where one agent iterations updating the target agent prompts. Not only can it evaluate the outcome after the changes, it can update and rerun self-healing and fixing itself.

by mysterydip13 hours ago|

prev|

[-]

I assume the reason it’s not baked in is so they can “hotfix” it after release. but surely that many things don’t need updates afterwards. there’s novels that are shorter.

by sigmoid1013 hours ago|

parent|

[-]

Yeah that was the original idea of system prompts. Change global behaviour without retraining and with higher authority than users. But this has slowly turned into a complete mess, at least for Anthropic. I'd love to see OpenAI's and Google's system prompts for comparison though. Would be interesting to know if they are just more compute rich or more efficient.

by aesthesia8 minutes ago|

parent|

[-]

Leaked/extracted system prompts for other chat models, particularly ChatGPT, are often around this size. Here's GPT-5.4: https://github.com/asgeirtj/system_prompts_leaks/blob/main/O...

by 12 hours ago|

parent|

prev|

[-]

deleted

by jatora13 hours ago|

prev|

[-]

There are different sections in the markdown for different models. It is only 3-4000 words

by winwang13 hours ago|

prev|

[-]

That's usually not how these things work. Only parts of the prompt are actually loaded at any given moment. For example, "system prompt" warnings about intellectual property are effectively alerts that the model gets. ...Though I have to ask in case I'm assuming something dumb: what are you referring to when you said "more than 60,000 words"?

by bavell13 hours ago|

parent|

[-]

The system prompt is always loaded in its entirety IIUC. It's technically possible to modify it during a conversation but that would invalidate the prefill cache for the big model providers.

by sigmoid1013 hours ago|

parent|

prev|

[-]

What you're describing is not how these things usually work. And all I did was a wc on the .md file.

by formerly_proven13 hours ago|

prev|

[-]

Surely the system prompt is cached across accounts?

by sigmoid1013 hours ago|

parent|

[-]

You can cache K and V matrices, but for such huge matrices you'll still pay a ton of compute to calculate attention in the end even if the user just adds a five word question.

by cfcf1413 hours ago|

parent|

prev|

[-]

I would assume so too, so the costs would not be so substantial to Anthropic.

by cma12 hours ago|

prev|

[-]

> And it gets churned by every single request they receive

It gets pretty efficiently cached, but does eat the context window and RAM.