I think the problem just comes down to adaptive thinking allowing the model to choose how much effort it spends on things, a power which it promptly abuses to be as lazy as possible. CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 significantly improved Opus 4.6's behavior and the quality of its results. But then what do they do when they release 4.7?
https://code.claude.com/docs/en/model-config
> Opus 4.7 always uses adaptive reasoning.
> The fixed thinking budget mode and CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING do not apply to it.