undefined

points

[-]

> when the model won't actually be able to provide one

This is key. In my experience, asking an LLM why it did something is usually pointless. In a subsequent round, it generally can't meaningfully introspect on its prior internal state, so it's just referring to the session transcript and extrapolating a plausible sounding answer based on its training data of how LLMs typically work.

That doesn't necessarily mean the reply is wrong because, as usual, a statistically plausible sounding answer sometimes also happens to be correct, but it has no fundamental truth value. I've gotten equally plausible answers just pasting the same session transcript into another LLM and asking why it did that.

by matheusmoreira2 hours ago|

prev|

[-]

That's good advice. I managed to get the session back on track by doing that a few turns later. I started making it very explicit that I wanted it to really think things through. It kept asking me for permission to do things, I had to explicitly prompt it to trace through and resolve every single edge case it ran into, but it seems to be doing better now. It's running a lot of adversarial tests right now and the results at least seem to be more thorough and acceptable. It's gonna take a while to fully review the output though.

It's just that Opus 4.6 DISABLE_ADAPTIVE_THINKING=1 doesn't seem to require me to do this at all, or at least not as often. It'd fully explore the code and take into account all the edge cases and caveats without any explicit prompting from me. It's a really frustrating experience to watch Anthropic's flagship subscription-only model burn my tokens only to end up lazily hand-waving away hard questions unless I explicitly tell it not to do that.

I have to give it to Opus 4.7 though: it recovered much better than 4.6.

by j-bos1 hours ago|

prev|

[-]

Yeah for anyone seriously using these models I highly reccomend reading the Mythos system card, esp the sections on analyzing it's internal non verbalized states. Save a lot of head wall banging.

by nelox1 hours ago|

prev|

[-]

Precisely. I find Grok’s multi-agent approach very useful here. I have custom agent configured as a validator.