upvote
> it happens when they give Claude too much autonomy. It works better when you tell it what to do, rather than letting it decide. That can be at a pretty high level, though. Basically reduce the problem to a set of well-established subproblems that it’s familiar with. Same as you’d do with a junior developer, really.

Equating "junior developers" and "coding LLMs" is pretty lame. You handhold a junior developers so, eventually, you don't have to handhold anymore. The junior developer is expected to learn enough, and be trusted enough, to operate more autonomously. "Junior developers" don't exist solely to do your bidding. It may be valuable to recognize similarities between a first junior developer interaction and a first LLM interaction, but when every LLM interaction requires it to be handheld, the value of the iterative nature of having a junior developer work along side you is not at all equivalent.

reply
I didn’t say they are equivalent, nor do I in any way consider them equivalent. One is a tool, the other is a person.

I simply said the description of the problem should be broken down similar to the way you’d do it for a junior developer. As opposed to the way you’d express the problem to a more senior developer who can be trusted to figure out the right way to do it at a higher level.

reply
> I have noticed some people I work with have more trouble, and my vague intuition is it happens when they give Claude too much autonomy

What’s giving too much autonomy about

“Please load settings.toml using a library and print out the name key from the application table”? Even if it’s under specified, surely it should at least leave it _compiling_?

I’ve been posting comments like this monthly here, my experience has been consistently this with Claude, opencode, antigravity, cursor, and using gpt/opus/sonnet/gemini models (latest at time of testing). This morning was opus 4.6

reply
> Even if it’s under specified, surely it should at least leave it _compiling_?

Are you using Claude Code? Do yo have it configured so that you are not allowing it to run the build? Because I've observed that Claude Code is extremely good at making sure the code compiles, because it'll run a compile and address any compile errors as part of the work.

I just asked it to build a TOML example program in DotNet using Tomlyn, and when it was done I was able to run "./bin/Debug/net8.0/dotnettoml example.toml", it had already built it for me (I watched it run the build step as part of its work, as I mentioned it would do above).

reply
I am using Claude code. I didn’t explicitly tell it what the build command was (it’s dotnet build), and it didn’t ask. Thats not my fault.

> I’ve observed Claude code is extremely good at making sure the code compiles

My observation is that it’s fine until it’s absolutely not, and the agentic loop fails.

reply
>Thats not my fault.

I don't know that it's useful to assign blame here.

It probably is to your benefit, if you are a coding professional, to understand why your results are so drastically different from what others are seeing. You started this thread saying "I keep getting told I'll be amazed at what it can do, but the tools keep failing at the first hurdle."

I'm telling you that something is wrong, that is why you are getting poor results. I don't know what is wrong, but I've given you an example prompt and an example output showing that Claude Code is able to produce the exact output you were looking for. This is why a lot of people are saying "you'll be amazed at what it can do", and it points to you having some issue.

I don't know if you are running an ancient version of Claude Code, if you are not using Opus 4.6, you are not using "high" effort (those are what I'm using to get the results I posted elsewhere in reply to your comment), but something is definitely wrong. Some of what may be wrong is that you don't have enough experience with the tooling, which I'd understand if you are getting poor results; you have little (immediate) incentive to get more proficient.

As I said, I was able to tell Claude Code to do something like the example you gave, and it did it and it built, without me asking, and produced a working program on the first try.

reply
> I don’t know that it’s useful to assign blame here

Oh - I’m blaming Claude not anyone else. I’ve tried again this evening and the same prompt (in the same directory on the same project) worked.

> i don’t know if you’re using an ancient version of Claude code,

I’m on a version from some time last week, and using opus 4.6

> This is why a lot of people are saying "you'll be amazed at what it can do", and it points to you having some issue.

If you look at my comments in these threads, I’ve had these issues and been posting about this for months. I’m still being told “ you’re using the wrong model or the wrong tool or you’re holding it wrong” but yet, here I am.

I’m using plan mode, clearly breaking down tasks and this happens to me basically every time I use the damn tool. Speaking to my team at work and friends in other workplaces, I hear the same thing. But yet we’re just using it wrong or doing something wrong,

Honestly, I genuinely think the people who are not having these experiences just… don’t notice that they are.

reply
Not even the worst possible prompt would explain your unusual experience, so I don't think that's it either.
reply
There’s nothing wrong with it that I can see. Like I said, I’m a bit baffled at your experience. I will say, it’s not unusual for the initial output not to compile, but usually one short iteration later that’s fixed. Claude Code will usually even do that iteration by itself.
reply
> I will say, it’s not unusual for the initial output not to compile,

We’ve gone from “I’m baffled at your experience” to well yeah it often fails” in two sentences here…

reply
Hmm…if you’re giving only one prompt to Claude Code, and allowing it only one output, then I’m no longer baffled at why you’re not getting good results. That’s not how it works. (That’s not how it works when I write code myself, either!)
reply
I mean, I don’t know how much less scope I can give it. The next step is writing the 5 lines of code I want it to write.

I also clearly said I didn’t allow it one output, I gave it the compile error message, it changed a different line, I told it it was at the affected line and to check the docs. Claude code then tried to query the DLL for the function, abandoned that and then did something else incorrect.

I’m literally asking it to install a package and copy the example from the readme

reply
It's not unusual for my initial output (as a programmer) not to compile either. I wouldn't say I "failed" if I can then get it to compile. Which as people are saying, is what happens with Claude Code and Opus, either automatically or at most when I say "get it to compile".
reply
But when it doesn’t compile for me, I don’t claim it’s finished.
reply
Similar. I regularly use Github copilot (with claude models sometimes) and it works amazingly. But I see some who struggle with them. I have sort of learned to talk to it, understand what it is generating, and routinely use to generate fixes, whole features, etc. much much faster than I could before.
reply