undefined

points

[-]

But you can tell it to once (in CLAUDE.md for example) and it will nearly every time (it's getting much better at that). Since opus 4.7 (which I consider a downgrade overall) it's been much better at following CLAUDE.md . I even have an intentional contradiction in my user-level CLAUDE.md and the project levels, so I can tell which one is taking precedent or if both are disregarded, and it follows at least one of them most of the time, and it follows the local one 95% of the time.

by ben_w4 hours ago|

prev|

[-]

While they absolutely do fail as you say (though in my experience not by default), this failure mode is still a massive improvement over the frequent human case of guessing based on the function/class/property/argument names.

Now, a really good human collaborator who reads all the stuff and thinks carefully, that was still better than what I saw from AI models at the start of this year. But I've also worked with my share of idiots, and been one too.

I'm not going to get into if *current* models can or can't reliably do any particular thing to any particular standard; previously my comparison was the same conversations with regard to video game computer graphics in the 90s always being "photorealistic" when they really weren't*; now, I'm starting to feel such discussions have the same vibes as Tesla fans insisting that "FSD-{insert current version here} solves all the problems and is a real breakthrough and the Rototaxi will totes conquer the marketplace this time for real bro, just one more version bro", etc.

* https://archive.org/details/nextgen-issue-26

by serf4 hours ago|

prev|

[-]

if you find yourself saying 'if you tell it to' a lot about LLMs that usually just says something about your prompting methods.

or, in other words , if you want the thing to always read the documentation then make that a strongly highlighted point both in pre-prompts, active prompts, and memory.

by fer2 hours ago|

parent|

[-]

It mustn't always nor never. It should follow a best judgement based on the .md, toml or whatever you use; in the end it's up to the LLM to decide which registered tools/mcps are used, and if the LLM is confident about some bs it will use that confidence instead of the tool.

When people complain about it, it's more often a gap between different knowledge domains and hard to measure characteristics of the environment, than it is an actual "you're using it wrong".

by moregrist4 hours ago|

prev|

[-]

Sometimes you get lucky and it both looks up the documentation and then ignores it and makes stuff up.