"We use PostgreSQL" reads as a soft preference. The model weighs it against whatever it thinks is optimal and decides you'd be better off with Supabase.
"NEVER create accounts for external databases. All persistence uses the existing PostgreSQL instance. If you're about to recommend a new service, stop." actually sticks.
The pattern that works: imperative prohibitions with specific reasoning. "Do not use Redis because we run a single node and pg_notify covers our pubsub needs" gives enough context that it won't reinvent the decision every session.
Your AGENTS.md should read less like a README and more like a linter config. Bullet points with DO/DON'T rules, not prose descriptions of your stack.
Given my own experience futilely fighting with Claude/Codex/OpenCode to follow AGENTS.MD/CLAUDE.MD/etc with different techniques that each purport to solve the problem, I think the better explanation really is that they just don't work reliably enough to depend on to enforce rules.
But you're right that "better" isn't "reliable." In practice it went from "constantly ignored" to "followed maybe 80% of the time." The remaining 20% is the model encountering situations where it decides the instruction doesn't apply to this specific case.
Honest answer is probably somewhere between "they don't work" and "write them right and you're fine." They raise the floor but don't guarantee anything. I still use them because 80% beats 20%, but I wouldn't bet production correctness on them.