I think also they tend to be generating non-C++ code where there are more guardrails and less footguns for LLMs to run into. Eg they're generating Javascript or Python or Rust where type systems and garbage collection eliminates entire classes of mistakes that LLMs can run into. I know you said you don't use it for Python because you know the language but even experienced Python devs still see value in LLM-generating Python code.
My worry about an agent is I’m trying to translate the math with full fidelity and an agent might take liberties with the math rather than full accuracy. I’m already having issues with 0 to 1 indexing screwing up some of the algorithm.
But I will try an agent - can’t hurt to try
(But for real, a good test suite seems like a great place to start before letting an LLM run wild... or alternatively just do what you're doing. We definitely respect textbook-readers more than prompters!)
Also let this be a lesson to internet folks to be careful what you post if your boss shitposts on the orange yelling site