undefined

points

[-]

This happens with all agents I've used and package.json files for npm. Instead of using `npm i foo` the agent string-edits package.json and hallucinates some version to install. Usually it's a kind of ok version, but it's not how I would like this to work.

It's worse with renaming things in code. I've yet to see an agent be able to use refactoring tools (if they even exist in VS Code) instead of brute-forcing renames with string replacement or sed. Agents use edit -> build -> read errors -> repeat, instead of using a reliable tool, and it burns a lot more GPU...

by embedding-shape4 hours ago|

parent|

[-]

> This happens with all agents I've used and package.json files for npm. Instead of using `npm i foo` the agent string-edits package.json and hallucinates some version to install.

When using codex, I usually have something like `Never add 3rd party libraries unless explicitly requested. When adding new libraries, use `cargo add $crate` without specifying the version, so we get the latest version.` and it seems to make this issue not appear at all.

by teaearlgraycold2 hours ago|

parent|

[-]

Eventually this specific issue will be RLHF’d out of existence. For now that should mostly solve the problem, but these models aren’t perfect at following instructions. Especially when you’re deep into the context window.

by girvo2 hours ago|

parent|

[-]

> Especially when you’re deep into the context window.

Though that is, at least to me, a bit of an anti-pattern for exactly that reason. I've found it far more successful to blow away the context and restart with a new prompt from the old context instead of having a very long running back-and-forward.

Its better than it was with the latest models, I can have them stick around longer, but it's still a useful pattern to use even with 4.6/5.3

by teaearlgraycold2 hours ago|

parent|

[-]

Opus has also clearly been trained to clear the context fairly often through the plan/code/plan cycle.

by avereveard34 minutes ago|

parent|

prev|

[-]

Worse still I created a mcp with refactoring tools and symbol based editing but because it's a) of of distribution for llm b) agent get their own heavy handed system prompts all the goodies get ignored

by root_axis2 hours ago|

parent|

prev|

[-]

> brute-forcing renames with string replacement

That's their strategy for everything the training data can't solve. This is the main reason the autonomous agent swarm approach doesn't work for me. 20 bucks in tokens just obliterated with 5 agents exchanging hallucinations with each-other. It's way too easy for them to amplify each other's mistakes without a human to intervene.

by threecheese3 hours ago|

parent|

prev|

[-]

For the first, I think maintaining package-add instructions is table stakes, we need to be opinionated here. Agents are typically good at following them, if not you can fall over to a Makefile that does everything.

For the second, I totally agree. I continue to hope that agents will get better at refactoring, and I think using LSPs effectively would make this happen. Claude took dozens of minutes to perform a rename which Jetbrains would have executed perfectly in like five seconds. Its approach was to make a change, run the tests, do it again. Nuts.

by richardw5 hours ago|

parent|

prev|

[-]

Totally. Surely the IDE’s like antigravity are meant to give the LLM more tools to use for eg refactoring or dependency management? I haven’t used it but seems a quick win to move from token generation to deterministic tool use.

by port114 hours ago|

parent|

[-]

As if. I’ve had Gemini stuck on AG because it couldn’t figure out how to use only one version of React. I managed to detect that the build failed because 2 versions of React were being used, but it kept saying “I’ll remove React version N”, and then proceeding to add a new dependency of the latest version. Loops and loops of this. On a similar note AG really wants to parse code with weird grep commands that don’t make any sense given the directory context.

by kittbuilds1 hours ago|

parent|

prev|

[-]

[dead]

by bakibab5 hours ago|

prev|

[-]

They are trying to fix it using this comment but cancelled mid way. Not sure why.

https://github.com/github/gh-aw/pull/14548

by onionisafruit4 hours ago|

parent|

[-]

Ha, they used my comment in the prompt. I love it.

by resquawk2 hours ago|

parent|

[-]

Thanks! We fixed this in another PR. Appreciate the feedback

by Lucasoato3 hours ago|

prev|

[-]

It is so important to use specific prompts for package upgrading.

Think about what a developer would do: - check the latest version online; - look at the changelog; - evaluate if it’s worth to upgrade or an intermediate may be alright in case of code update are necessary;

Of course, the keep these operations among the human ones, but if you really want to automate this part (and you are ready to pay its consequences) you need to mimic the same workflow. I use Gemini and codex to look for package version information online, it checks the change logs from the version I am to the one I’d like to upgrade, I spawn a Claude Opus subagent to check if in the code something needs to be upgraded. In case of major releases, I git clone the two packages and another subagents check if the interfaces I use changed. Finally, I run all my tests and verify everything’s alright.

Yes, it might not still be perfect, but neither am I.

by awesome_dude2 hours ago|

prev|

[-]

This is more evidence of my core complaint with AI (and why it's not AGI at this point)

The AI hasn't understood what's going on, instead it has pattern matched strings and used those patterns to create new strings that /look/ right, but fail upon inspection.

(The human involved is also failing my Turing test... )