If you have to do the reasoning and tell the LLM the results of your reasoning before it can generate the code you want, surely that tells you the LLM isn’t reasoning. Agentic workflows hide some of it, but anyone who’s interacted even a little with an LLM can tell they’re not reasoning, no matter how OpenAI and Anthropic label their models.
That would not surprise me.
Structure exists for a reason, and I say that as someone who loves to go into deep hack and produce some ultra clever jamboozle that works spectacularly well, as long as you don’t ever have to touch it. In production, there is no worse code than clever code. It’s soul sucking, but we have to make peace with elegance = maintainability / portability. Often, that means 30 LOC instead of ten, but future you thanks you, and the (modern, optimised) compiler doesn’t care.