We did both -- we did a number of UI iterations (eg. improving thinking loading states, making it more clear how many tokens are being downloaded, etc.). But we also reduced the default effort level after evals and dogfooding. The latter was not the right decision, so we rolled it back after finding that UX iterations were insufficient (people didn't understand to use /effort to increase intelligence, and often stuck with the default -- we should have anticipated this).
For instance:
Is Haiku supposed to hit a warm system-prompt cache in a default Claude code setup?
I had `DISABLE_TELEMETRY=1` in my env and found the haiku requests would not hit a warm-cached system prompt. E.g. on first request just now w/ most recent version (v2.1.118, but happened on others):
w/ telemetry off - input_tokens:10 cache_read:0 cache_write:28897 out:249
w/ telemetry on - input_tokens:10 cache_read:24344 cache_write:7237 out:243
I used to think having so many users was leading to people hitting a lot of edge cases, 3 million users is 3 million different problems. Everyone can't be on the happy path. But then I started hitting weird edge cases and started thinking the permutations might not be under control.
https://github.com/anthropics/claude-code/issues/36286 https://github.com/anthropics/claude-code/issues/25018
UI is UI. It is naive to expect that you build some UI but users will "just magically" find out that they should use it as a terminal in the first place.
Anthropic: removes thinking output
Users: see long pauses, complain
Anthropic: better reduce thinking time
Users: wtf
To me it really, really seems like Anthropic is trying to undo the transparency they always had around reasoning chains, and a lot of issues are due to that.
Removing thinking blocks from the convo after 1 hour of being inactive without any notice is just the icing on the cake, whoever thought that was a good idea? How about making “the cache is hot” vs “the cache is cold” a clear visual indicator instead, so you slowly shape user behavior, rather than doing these types of drastic things.
They had droves of Claude devs vehemently defending and gaslighting users when this started happening