It responds with the statistically most probable text based on its training data, which happens to be different with the errors vs without. I suspect high-fidelity diagramming requires a different attention architecture from the common ones used in sentence-optimized models.
It provides both syntax guides and syntax/semantic analysis as MCP Tools, so you can have an agent iteratively refine diagrams with good context for patterns like multi-line text and comments (LLMs love end-of-line comments, but Mermaid.js often doesn’t).
What instance of ChatGPT are you doing that with? (Reasoning?)
I've noticed the same thing when creating an agentic loop, if the model outputs a syntax error, just automatically feed it back to the LLM and give it a second chance. It dramatically increases the success rate.
Gemini, ChatGPT or Grok would find this a lot easier as they could gen an image inline, although IP restrictions might bite you. Even Grok wants to lecture on IP these days, but at least it's fairly trivial to jailbreak.