undefined

points

[-]

I've been talking to friends about this extensively, and read all sorts of different social media posts on X where people deep dove things (I'm at work so I don't have any links handy - though I did submit one on HN, grain of salt, unsure how valid it is but it was interesting: https://news.ycombinator.com/item?id=47752049 ).

I think the real issue stems from the 1 Million token context window change. They did not anticipate the amount of load it would give you. That first few days after they released the new token window, I was making amazing things in one single session from nothing, to something (a new .NET based programming language inspired by Python, and a Virtual Actor framework in Rust). I think since then they've been trying too many things to tweak things, whilst irritating their users.

They even added a new "Max" thinking mode, and made "High" the old medium, which is ridiculous because you think you're using "High" but really you're not. There's a hidden config file to change their terrible defaults to let Claude be smarter still, and apparently you can toggle off the 1M tokens.

I think the real fix, and I'm surprised nobody there has done this yet, is to let the user trim down their context window.

Think about it, you used to have what? 350k tokens or so? Now Claude will keep sending your prompt from 30 minutes ago that's completely irrelevant to the back-end, whereas 3 months ago it would have been compacted by now.

Others have noted that similar prompting for some ungodly reason adds tens of thousands of extra garbage tokens (not sure why).

Edit looks like someone figured out that if you downgrade your version of Claude Code and change one single setting it unruins Claude:

https://news.ycombinator.com/item?id=47769879

by SkyPuncher3 hours ago|

parent|

[-]

Yea, I've realized that if I stay under 200k tokens I basically don't have usage issues any more.

A bit annoying, but not the end of the world.

by consumer4511 hours ago|

parent|

[-]

super-edit: Sorry, this is not a usage related question, I have move it to: https://news.ycombinator.com/item?id=47772971

Here is the question for which I cannot find an answer, and cannot yet afford to answer myself:

In Claude Code, I use Opus 4.6 1M, but stay under 250k via careful session management to avoid known NoLiMa [0] / context rot [1] crap. The question I keep wanting answered though: at ~165k tokens used, does Opus 1M actually deliver higher quality than Opus 200k?

NoLiMa would indicate that with a ~165k request, Opus 200k would suck, and Opus 1M would be better (as a lower percentage of the context window was used)... but they are the same model. However, there are practical inference deployment differences that could change the whole paradigm, right? I am so confused.

Anthropic says it's the same model [2]. But, Claude Code's own source treats them as distinct variants with separate routing [3]. Closest test I found [4] asserts they're identical below 200K but it never actually A/B tests, correct?

Inside Claude Code it's probably not testable, right? According to this issue [5], the CLI is non-deterministic for identical inputs, and agent sessions branch on tool-use. Would need a clean API-level test.

The API level test is what I really want to know for the Claude based features in my own apps. Is there a real benchmark for this?

I have reached the limits of my understanding on this problem. If what I am trying to say makes any sense, any help would be greatly appreciated.

If anyone could help me ask the question better, that would also be appreciated.

[0] https://arxiv.org/abs/2502.05167

[1] https://research.trychroma.com/context-rot

[2] https://claude.com/blog/1m-context-ga

[3] https://github.com/anthropics/claude-code/issues/35545

[4] https://www.claudecodecamp.com/p/claude-code-1m-context-wind...

[5] https://github.com/anthropics/claude-code/issues/3370

by onenite1 hours ago|

parent|

[-]

2 parent comments above say that you can use older version of claude code with opus 200k to compare. my guess is that eventually you’ll be able to set it in model settings yourself

by dacox5 hours ago|

parent|

prev|

[-]

Yeah, I have been seeing lots of comments, tweets, etc, but given everything I have learned about these models - i do not think the change to 1M was innocuous. I'm not sure what they've claimed publicly, but I'm fairly certain they must be doing additional quantization, or at minimum additional quantization of the KV cache. Plus, sequence length can change things even when not fully utilized. I had to manually re-enable the "clear context and continue" feature as well.

by giancarlostoro4 hours ago|

parent|

[-]

I used the heck out of it when it was announced, and it felt like I was using one of the best models I've ever used, but then so were all of their other customers, I don't think they accounted for such heavy load, or maybe follow up changes goofed something up, not sure. Like I said, the 1M token, for the first few days allowed me to bust out some interesting projects in one session from nothing to "oh my" in no time.

I'm thinking they should go back to all their old settings and as a user cap you at their old token limit, and ask you if you want to compact at your "soft" limit or burst for a little longer, to finish a task.

by imhoguy5 hours ago|

prev|

[-]

AI race to the bottom is a debt game now. Once the party is over somebody will have to pay the bill.

by timacles2 hours ago|

parent|

[-]

It’s going to be crazy with the explanation they come up with why the us public has to pay to bail out AI for national security.

In a way, it’s true if china has superior AI then it’s dominance over US will materialize. But it’s not hard to see how this scenario is being used to essential lie and scam into trillions of debt.

Its interesting how the cutthroat space of big tech has manifested into an incidious hyper capitalist system where disrupting a system is it’s primary function. The system in this case is world order and western governments

by joquarky1 hours ago|

parent|

[-]

"Move fast and break things" broke containment to the tech industry. Now you can see it everywhere.

by breakingcups6 hours ago|

prev|

[-]

You seem to be vouched for now, no longer dead for me.

by minimaxir6 hours ago|

parent|

[-]

Hmm, I can't edit the original comment to retract that edit either. Either my account is flagged for something or HN is being weird.