You can always reduce high-level phenomena to lower-level mechanisms. That doesn't mean that the high-level phenomenon doesn't exist. LLMs are obviously able to understand and follow instructions.
And yet they don't, quite a lot of the time, and in a random way that is hard to predict or even notice sometimes (their errors can be important but subtle/small).
They're simply not reliable enough to treat as independent agents, and this story is a good example of why not.
Second, whether they're perfect at following commands is besides the point. They're not just "predicting tokens," in the same way you're not just "sending electrochemical signals." LLMs think, solve problems, answer questions, write code, etc.