undefined

upvote

points

by fnands15 hours ago |

upvote

by maccard15 hours ago|

[-]

I’m not OP but every time I post a comment with this sentiment I get told “the latest models are what you need”. If every 3 months you are saying “it’s ready as long as you use the latest model”, then it wasn’t ready 3 months ago and it’s not likely to be ready now.

To answer your question, I’ve tried both Claude code and Antigravity in the last 2 weeks and I’m still finding them struggling. AG with Gemini regularly gets stuck on simple issues and loops until I run out of requests, and Claude still just regularly goes on wild tangents not actually solving the problem.

reply

upvote

by anon700014 hours ago|

[-]

I don’t think that’s true. Claude Opus 4.5/4.6 in Cursor have marked the big shift for me. Before that, agentic development mostly made me want to just do it myself, because it was getting stuck or going on tangents.

I think it can (and is) shifting very rapidly. Everyone is different, and I’m sure models are better at different types of work (or styles of working), but it doesn’t take much to make it too frustrating to use. Which also means it doesn’t take much to make it super useful.

reply

upvote

by maccard12 hours ago|

[-]

> I don’t think that’s true. Claude Opus 4.5/4.6 in Cursor.

Opus 4.6 has been out for less than a month. If it was a big shift surely we'd see a massive difference over 4.5 which was november. I think this proves the point, you're not seeing seisimic shifts every 3 months and you're not even clear about which model was the fix.

> I think it can (and is) shifting very rapidly.

Shifting, maybe. But shuffling deck chairs every 3 months.

reply

upvote

by thunky11 hours ago|

[-]

I interpreted their comment to mean 4.5 was the shift, which was nov last year. "Before that" meaning pre 4.5.

reply

upvote

by fendy300213 hours ago|

[-]

It depends on what you're handling. Frontend (not css), swagger, mundane CRUD is where it shines. Something more complex that need a bit harder calculation usually make the agents struggling.

Especially good to navigate the code if you're unfamiliar with it (the code). If you have known the code for good, you'll find it's usually faster to debug and code by yourself.

Opus 4.6 with claude code vscode extension

reply

upvote

by sergiosgc14 hours ago|

[-]

Have you tried it with something like OpenSpec? Strangely, taking the time to lay out the steps in a large task helps immensely. It's the difference between the behavior you describe and just letting it run productively for segments of ten or fifteen minutes.

reply

upvote

by maccard12 hours ago|

[-]

> Have you tried it with something like OpenSpec?

No. The parent comment said I needed a new model, which I've tried. Being told "just try something else aswell" kind of proves the point.

reply

upvote

by edgyquant12 hours ago|

[-]

I thought this too and then I discovered plan mode. If you just prompt agent mode it will be terrible, but coming up with a plan first has really made a big difference and I rarely write code at all now

reply

upvote

by ramoz5 hours ago|

[-]

My workflow has become very plan-intensive... including planning of verification+test steps at the end.

reply

upvote

by techpression14 hours ago|

[-]

Agree, it’s strange, I will just assume that the people who say this are building react apps. I still have so much ”certainly, I should not do this in a completely insane way, let me fix that” … -400+2. It’s not always, and it is better than it was, but that’s it.

reply

upvote

by fnands12 hours ago|

[-]

I'm an ML engineer, so it's mostly been setting up data processing/training code in PyTorch, if that helps.

reply

upvote

by fragmede12 hours ago|

[-]

At this point though, after Claude C Compiler, you've got to give us more details to better understand the dichotomy. What do you consider simple issues?

reply

upvote

by maccard12 hours ago|

[-]

> At this point though, after Claude C Compiler,

Perfect example. You mean the C compiler that literally failed to compile a hello world [0] (which was given in it's readme)?

> What do you consider simple issues?

Hallucinating APIs for well documented libraries/interfaces, ignoring explicit instructions for how to do things, and making very simple logic errors in 30-100 line scripts.

As an example, I asked Claude code to help me with a Roblox game last weekend, and specifically asked it to "create a shop GUI for <X> which scales with the UI, and opens when you press E next to the character". It proceeded to create a GUI with absolute sizings, get stuck on an API hallucination for handling input, and also, when I got it unstuck, it didn't actually work.

[0] https://github.com/anthropics/claudes-c-compiler/issues/1

reply

upvote

by sarchertech12 hours ago|

[-]

Claude C compiler is 100k LOC that doesn’t do anything useful, and cost $20k plus the cost of an expert engineer creating a custom harness and babysitting it.

But the most important thing is that they were reverse engineering gcc by using it as an oracle. And it had gcc and thousands of other c compilers in its training set.

So if you are a large corporation looking to copy GPL code so that you can use it without worrying about the license, and the project you want to copy is a text transformer with a rigorously defined set of inputs and outputs, have at it.

reply

upvote

by benrutter15 hours ago|

[-]

> When was the last time you tried?

Pretty recently (a couple weeks ago). I give agentic workflows a go every couple of weeks or so.

I should say, I don't find them abysmal, but I tend to work in codebases where I understand them, and the patterns really well. The use cases I've tried so far, do sort of work, just not yet at least, faster than I'm able to actual write the code myself.

reply

upvote

by darkwater13 hours ago|

[-]

> My experience with what coding assistants are good for shifted from:

> smart autocomplete -> targeted changes/additions -> full engineering

Define "full engineering". Because if you say "full engineering" I would expect the agent to get some expected product output details as input and produce all by itself the right implementation for the context (i.e. company) it lives in.

reply

upvote

by fnands9 hours ago|

[-]

I agree that "full engineering" was a bit broad. I should probably have said something like "agent-only coding"?

I.e. the point where the agent writes all the code and you just verify.

reply

upvote

by darkwater8 hours ago|

[-]

The "you just verify" part can take indeed a lot of steering and hand-holding to get the right implementation for the current company/department/project context. Otherwise you might be just generating tech debt at scale.

reply