Given my own experience futilely fighting with Claude/Codex/OpenCode to follow AGENTS.MD/CLAUDE.MD/etc with different techniques that each purport to solve the problem, I think the better explanation really is that they just don't work reliably enough to depend on to enforce rules.
But you're right that "better" isn't "reliable." In practice it went from "constantly ignored" to "followed maybe 80% of the time." The remaining 20% is the model encountering situations where it decides the instruction doesn't apply to this specific case.
Honest answer is probably somewhere between "they don't work" and "write them right and you're fine." They raise the floor but don't guarantee anything. I still use them because 80% beats 20%, but I wouldn't bet production correctness on them.