undefined

points

[-]

Hey, Boris from the team here.

We did both -- we did a number of UI iterations (eg. improving thinking loading states, making it more clear how many tokens are being downloaded, etc.). But we also reduced the default effort level after evals and dogfooding. The latter was not the right decision, so we rolled it back after finding that UX iterations were insufficient (people didn't understand to use /effort to increase intelligence, and often stuck with the default -- we should have anticipated this).

by big_toast17 hours ago|

parent|

[-]

Having a "Recovery Mode"/"Safe Boot" flag to disable our configurations (or progressively enable) to see how claude code responds would be nice. Sometimes I get worried some old flag I set is breaking things. Maybe the flag already exists? I tried Claude doctor but it wasn't quite the solution.

For instance:

Is Haiku supposed to hit a warm system-prompt cache in a default Claude code setup?

I had `DISABLE_TELEMETRY=1` in my env and found the haiku requests would not hit a warm-cached system prompt. E.g. on first request just now w/ most recent version (v2.1.118, but happened on others):

w/ telemetry off - input_tokens:10 cache_read:0 cache_write:28897 out:249

w/ telemetry on - input_tokens:10 cache_read:24344 cache_write:7237 out:243

I used to think having so many users was leading to people hitting a lot of edge cases, 3 million users is 3 million different problems. Everyone can't be on the happy path. But then I started hitting weird edge cases and started thinking the permutations might not be under control.

by krade14 hours ago|

parent|

prev|

[-]

Off topic, but I'm hoping you'll maybe see this. There's been an issue with the VS code extension that makes it pretty much impossible to use (PreToolUse can't intercept permission requests anymore, using PermissionRequest hooks always open the diff viewer and steals focus):

https://github.com/anthropics/claude-code/issues/36286 https://github.com/anthropics/claude-code/issues/25018

by EugeneOZ17 hours ago|

parent|

prev|

[-]

> people didn't understand to use /effort to increase intelligence, and often stuck with the default -- we should have anticipated this

UI is UI. It is naive to expect that you build some UI but users will "just magically" find out that they should use it as a terminal in the first place.

by abtinf15 hours ago|

parent|

prev|

[-]

You didn’t anticipate most people stick with defaults?

by bcherny8 hours ago|

parent|

[-]

We anticipated the default would be the best option for most people. We were wrong, so we reverted the default.

by taytus14 hours ago|

parent|

prev|

[-]

“after evals and dogfooding” couldn’t have done this before releasing the model? We are paying $200/month to beta test the software for you.

by stingraycharles10 hours ago|

prev|

[-]

Yeah, this is so silly.

Anthropic: removes thinking output

Users: see long pauses, complain

Anthropic: better reduce thinking time

Users: wtf

To me it really, really seems like Anthropic is trying to undo the transparency they always had around reasoning chains, and a lot of issues are due to that.

Removing thinking blocks from the convo after 1 hour of being inactive without any notice is just the icing on the cake, whoever thought that was a good idea? How about making “the cache is hot” vs “the cache is cold” a clear visual indicator instead, so you slowly shape user behavior, rather than doing these types of drastic things.

by sekai7 hours ago|

prev|

[-]

> Instead of fixing the UI they lowered the default reasoning effort parameter from high to medium? And they "traced this back" because they "take reports about degradation very seriously"? Extremely hard to give them the benefit of doubt here.

They had droves of Claude devs vehemently defending and gaslighting users when this started happening