Equating "junior developers" and "coding LLMs" is pretty lame. You handhold a junior developers so, eventually, you don't have to handhold anymore. The junior developer is expected to learn enough, and be trusted enough, to operate more autonomously. "Junior developers" don't exist solely to do your bidding. It may be valuable to recognize similarities between a first junior developer interaction and a first LLM interaction, but when every LLM interaction requires it to be handheld, the value of the iterative nature of having a junior developer work along side you is not at all equivalent.
I simply said the description of the problem should be broken down similar to the way you’d do it for a junior developer. As opposed to the way you’d express the problem to a more senior developer who can be trusted to figure out the right way to do it at a higher level.
What’s giving too much autonomy about
“Please load settings.toml using a library and print out the name key from the application table”? Even if it’s under specified, surely it should at least leave it _compiling_?
I’ve been posting comments like this monthly here, my experience has been consistently this with Claude, opencode, antigravity, cursor, and using gpt/opus/sonnet/gemini models (latest at time of testing). This morning was opus 4.6
Are you using Claude Code? Do yo have it configured so that you are not allowing it to run the build? Because I've observed that Claude Code is extremely good at making sure the code compiles, because it'll run a compile and address any compile errors as part of the work.
I just asked it to build a TOML example program in DotNet using Tomlyn, and when it was done I was able to run "./bin/Debug/net8.0/dotnettoml example.toml", it had already built it for me (I watched it run the build step as part of its work, as I mentioned it would do above).
> I’ve observed Claude code is extremely good at making sure the code compiles
My observation is that it’s fine until it’s absolutely not, and the agentic loop fails.
I don't know that it's useful to assign blame here.
It probably is to your benefit, if you are a coding professional, to understand why your results are so drastically different from what others are seeing. You started this thread saying "I keep getting told I'll be amazed at what it can do, but the tools keep failing at the first hurdle."
I'm telling you that something is wrong, that is why you are getting poor results. I don't know what is wrong, but I've given you an example prompt and an example output showing that Claude Code is able to produce the exact output you were looking for. This is why a lot of people are saying "you'll be amazed at what it can do", and it points to you having some issue.
I don't know if you are running an ancient version of Claude Code, if you are not using Opus 4.6, you are not using "high" effort (those are what I'm using to get the results I posted elsewhere in reply to your comment), but something is definitely wrong. Some of what may be wrong is that you don't have enough experience with the tooling, which I'd understand if you are getting poor results; you have little (immediate) incentive to get more proficient.
As I said, I was able to tell Claude Code to do something like the example you gave, and it did it and it built, without me asking, and produced a working program on the first try.
Oh - I’m blaming Claude not anyone else. I’ve tried again this evening and the same prompt (in the same directory on the same project) worked.
> i don’t know if you’re using an ancient version of Claude code,
I’m on a version from some time last week, and using opus 4.6
> This is why a lot of people are saying "you'll be amazed at what it can do", and it points to you having some issue.
If you look at my comments in these threads, I’ve had these issues and been posting about this for months. I’m still being told “ you’re using the wrong model or the wrong tool or you’re holding it wrong” but yet, here I am.
I’m using plan mode, clearly breaking down tasks and this happens to me basically every time I use the damn tool. Speaking to my team at work and friends in other workplaces, I hear the same thing. But yet we’re just using it wrong or doing something wrong,
Honestly, I genuinely think the people who are not having these experiences just… don’t notice that they are.
We’ve gone from “I’m baffled at your experience” to well yeah it often fails” in two sentences here…
I also clearly said I didn’t allow it one output, I gave it the compile error message, it changed a different line, I told it it was at the affected line and to check the docs. Claude code then tried to query the DLL for the function, abandoned that and then did something else incorrect.
I’m literally asking it to install a package and copy the example from the readme