Anthropic silently downgraded cache TTL from 1h → 5M on March 6th

upvote

Anthropic silently downgraded cache TTL from 1h → 5M on March 6th

(github.com)

114 points

by lsdmtme4 hours ago |

upvote

by sunaurus1 hours ago|

[-]

Has anybody else noticed a pretty significant shift in sentiment when discussing Claude/Codex with other engineers since even just a few months ago? Specifically because of the secret/hidden nature of these changes.

I keep getting the sense that people feel like they have no idea if they are getting the product that they originally paid for, or something much weaker, and this sentiment seems to be constantly spreading. Like when I hear Anthropic mentioned in the past few weeks, it's almost always in some negative context.

reply

upvote

by andai44 minutes ago|

[-]

Well, off the top of my head:

- Banning OpenClaw users (within their rights, of course, but bad optics)

- Banning 3rd party harnesses in general (ditto)

(claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked. Would be great to get some clarity on this. If I invoke it from my Telegram bot, is that an unauthorized 3rd party harness?)

- Lowering reasoning effort (and then showing up here saying "we'll try to make sure the most valuable customers get the non-gimped experience" (paraphrasing slightly xD))

- Massively reduced usage (apparently a bug?) The other day I got 21x more usage spend on the same task for Claude vs Codex.

- Noticed a very sharp drop in response length in the Claude app. Asked Claude about it and it mentioned several things in the system prompt related to reduced reasoning effort, keeping responses as brief as possible, etc.

It's all circumstantial but everything points towards "desperately trying to cut costs".

I love Claude and I won't be switching any time soon (though with the usage limits I'm increasingly using Codex for coding), but it's getting hard to recommend it to friends lately. I told a friend "it was the best option, until about two weeks ago..." Now it's up in the air.

reply

upvote

by rlpb27 minutes ago|

[-]

> It's all circumstantial but everything points towards "desperately trying to cut costs".

I have been wondering if it's more geared at reducing resource usage, given that at the moment there's a known constraint on AI datacenter expansion capability. Perhaps they are struggling to meet demand?

reply

upvote

by joshstrange31 minutes ago|

[-]

> (claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked. Would be great to get some clarity on this. If I invoke it from my Telegram bot, is that an unauthorized 3rd party harness?)

100% this, I’ve posted the same sentiment here on HN. I hate the chilling effect of the bans and the lack of clarity on what is and is not allowed.

reply

upvote

by politelemon31 minutes ago|

[-]

Why were third party harnesses banned? Surely they'd want sticking power over the ecosystem.

reply

upvote

by cedws17 minutes ago|

[-]

There’s the argument that Anthropic has built Claude Code to use the models efficiently, which the subscription pricing is based on.

Maybe there’s some truth to that, but then why haven’t OpenAI made the same move? I believe the main reason is platform control. Anthropic can’t survive as a pipeline for tokens, they need to build and control a platform, which means aggressively locking out everybody else building a platform.

reply

upvote

by risyachka41 minutes ago|

[-]

>> apparently a bug?

it's a bug only if they get a harsh public response, otherwise it becomes a feature

reply

upvote

by esperent38 minutes ago|

[-]

> claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked

I've used it with a sub a lot. Concurrency of 40 writing descriptions of thousands of images, running for hours on sonnet.

I have a lot of complaints. I've cancelled my $200 subscription and when it runs out in a few days I'll have to find something else.

But claude -p is fine.

... Or it was 2 week ago. Who knows if they've silently throttled it by now?

reply

upvote

by zazibar34 minutes ago|

[-]

A month ago the company I work at with over 400 engineers decided to cancel all IDE subscriptions (Visual Studio, JetBrains, Windsurf, etc.) and move everyone over to Claude Code as a "cost-saving measure" (along with firing a bunch of test engineers). There was no migration plan - the EVP of Technology just gave a demo showing 2 greenfield projects they'd built with Claude Opus over a weekend and told everyone to copy how he worked. A week later the EVP had to send out an email telling people to stop using Opus because they were burning through too many tokens.

Claude seems to be getting nerfed every week since we've switched. I wonder how our EVP is feeling now.

reply

upvote

by dickersnoodle2 minutes ago|

[-]

Hopefully that EVP feels embarrassed that a big bet was made that not only didn't pay off but left the company in a worse position. Some schadenfreude may be all you can expect, since this is an executive.

reply

upvote

by jakobnissen1 hours ago|

[-]

Yeah I’ve seen this too. It’s difficult for me to tell if the complaints are due to a legitimate undisclosed nerf of Claude, or whether it’s just the initial awe of Opus 4.6 fading and people increasingly noticing its mistakes.

reply

upvote

by babaganoosh8950 minutes ago|

[-]

It's not just you, there is a github issue for it: https://github.com/anthropics/claude-code/issues/42796

reply

upvote

by PunchyHamster58 minutes ago|

[-]

Both can be a thing at same time

reply

upvote

by iLoveOncall1 hours ago|

[-]

I think there's a much more nefarious reason that you're missing.

It's pretty clear that OpenAI has consistently used bots on social networks to peddle their products. This could just be the next iteration, mass spreading lies about Anthropic to get people to flock back to their own products.

That would explain why a lot of users in the comments of those posts are claiming that they don't see any changes to limits.

reply

upvote

by javawizard55 minutes ago|

[-]

The trouble with that argument, though, is that it works the other way as well: how do I, a random internet citizen, know that you're not doing the same thing for Anthropic with this comment?

(FWIW I have definitely noticed a cognitive decline with Claude / Opus 4.6 over the past month and a half or so, and unless I'm secretly working for them in my sleep, I'm definitely not an Anthropic employee.)

reply

upvote

by iLoveOncall50 minutes ago|

[-]

Oh it's pretty clear to me that Anthropic employs the same tactics and uses bots on socials to push its products too. On Reddit a couple of months ago it was simply unbearable with all the "Claude Opus is going to take all the jobs".

You definitely shouldn't trust me, as we're way beyond the point where you can trust ANYTHING on the internet that has a timestamp later than 2021 or so (and even then, of course people were already lying).

Personally I use Claude models through Bedrock because I work for Amazon, and I haven't noticed any decline. Instead it's always been pretty shit, and what people describe now as the model getting lost of infinite loops of talking to itself happened since the very start for me.

reply

upvote

by hirako20001 hours ago|

[-]

Judging from the number of GitHub issues on Anthropic, shamelessly being dismissed as "fixed", I doubt openai needs the bots to tarnish that competitor.

reply

upvote

by kingkongjaffa1 hours ago|

[-]

Just one more anecdote:

I'm on the enterprise team plan so a decent amount of usage.

In March I could use Opus all day and it was getting great results.

Since the last week of March and into April, I've had sessions where I maxed out session usage under 2 hours and it got stuck in overthinking loops, multiple turns of realising the same thing, dozens of paragraphs of "But wait, actually I need to do x" with slight variations of the same realisation.

This is not the 'thinking effort' setting in claude code, I noticed this happening across multiple sessions with the same thinking effort settings, there was clearly some underlying change that was not published that made the model get stuck in thinking loops more for longer and more often without any escape hatch to stop and prompt the user for additional steering if it gets stuck.

reply

upvote

by adahn54 minutes ago|

[-]

I’ve seen the point raised elsewhere that this could be the double usage promo that was available from the 13th of March to the 28th. ie. people getting used to the promo then feeling impacted when it finished.

Although it seems that enterprise wasn’t included, so maybe not in your case.

https://support.claude.com/en/articles/14063676-claude-march...

reply

upvote

by cyanydeez30 minutes ago|

[-]

its sounds like, tinfoil hat, they reduced the quant size of their model and tried to mask the change with the promo. your theory only addresses the spend not the reduced realiability

reply

upvote

by UqWBcuFx6NV4r1 hours ago|

[-]

Whenever I see Opus say “but wait, …”—which is all the time—I get a little bit closer toward throwing my computer out the window. Sometimes I just collapse the thinking section, cross my fingers, and wait for the answer. It’s too frustrating watching the thinking process.

reply

upvote

by pxtail1 hours ago|

[-]

There's still plenty of "leave my fellow multbillion corp alone" type ones,it means that corp can and should screw it's loving customer base harder.

reply

upvote

by simianwords1 hours ago|

[-]

The enshittification meme has been taken too seriously to the point where it is shoehorned into every single place possible.

It is not in the interests for Anthropic to screw its customer base. Running a frontier lab comes with tradeoffs between training, inference and other areas.

reply

upvote

by officialchicken41 minutes ago|

[-]

The investors are their customers - not the users of the end-product.

reply

upvote

by simianwords21 minutes ago|

[-]

This shows a lack of understanding of how markets work. Investors make money when the valuation of the company increases. The valuation of the company is the best prediction of future profit risk adjusted.

How would anthropic increase future profits without satisfying customers?

reply

upvote

by matheusmoreira1 hours ago|

[-]

I certainly noticed a significant drop in reasoning power at some point after I subscribed to Claude. Since then I've applied all sorts of fixes that range from disabling adaptive thinking to maxing out thinking tokens to patching system prompts with an ad-hoc shell script from a gist. Even after all this, Opus will still sometimes go round and round in illogical circles, self-correcting constantly with the telltale "no wait" and undoing everything until it ends up right where it started with nothing to show for it after 100k tokens spent.

Whether it's due to bugs or actual malice, it's not a good look. I genuinely can't tell if it's buggy, if it's been intentionally degraded, if it's placebo or if it's all just an elaborate OpenAI psyop.

reply

upvote

by babaganoosh8951 minutes ago|

[-]

There's a github issue for this: https://github.com/anthropics/claude-code/issues/42796

reply

upvote

by matheusmoreira36 minutes ago|

[-]

Yes, I commented on it and applied all remedies suggested.

https://news.ycombinator.com/item?id=47664442

Configuration and environment variables seem to have improved things somewhat but it still seems to be hit or miss.

reply

upvote

by oezi20 minutes ago|

[-]

On OpenRouter token consumption is up 5x since November 2025. If this is indicative of the industries growth then I can't fathom how we will not hit resource constraints.

reply

upvote

by echelon1 hours ago|

[-]

Anthropic isn't your friend.

Phase 1: $200/mo prosumer engineer tool

Phase 2: AI layoffs / "it's just AI washing"

Phase 3: $20,000/mo limited release model "too dangerous" to use

Phase 4: Accelerated layoffs / two person teams. Rehiring of certain personnel at lower costs.

Phase 5: "Our new model can decompile and rewrite any commercial software. We just wrote a new kernel after looking at Linux (bye, bye GPL!) We also decompiled the latest Zelda game, ported the engine to Rust, and made a new game with it. Source code has no value. Even compiled and obfuscated code is a breeze to clone."

Phase 6: $100k/mo model that replicates entire engineering teams, only large companies can afford it. Ordinary users can't buy. More layoffs.

Phase N: People can't afford computing anymore. Everything is thin clients and rented. It's become like the private railroad industry. End of the PC era. Like kids growing up on smartphones, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.

Anothropic used to be cool before they started gating access. Limiting Claw/OpenCode was strike one. Mythos is strike two.

Y'all should have started hating on their ethics when they started complaining about being distilled. For training they conducted on materials they did not own.

We need open weights companies now more than ever. Too bad China seems to be giving up on the idea.

"You wouldn't distill an Opus."

reply

upvote

by PunchyHamster56 minutes ago|

[-]

Stop thinking billion dollar publicly traded companies are "cool" just because they make widget you like.

You will be backstabbed

You will be squeezed for all they can.

And you will be betrayed.

> Phase N: People can't afford computing anymore. Everything is thin clients and rented. It's become like the private railroad industry. End of the PC era. Like kids growing up on smartphones, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.

Thankfully none of them actually makes money and just runs on investment so there is a good chance bubble will drop and the price of PC equipment will... continue to rise as US gives up Taiwan to China

reply

upvote

by andai40 minutes ago|

[-]

What I want to know is how did they make the only LLM that doesn't sound cringe?

I think it has something to do with mode collapse (although Claude certainly has its own "tells"), but I'm not sure.

It sounds trivial but even for Agentic, I found the writing style to be really important. When you give Claude a persona, it sounds like the thing. When you give GPT a persona, it sounds like GPT half-assedly pretending to be the thing.

---

Some other interesting points about Anthropic's models. I don't know if any of these relate to my LLM style question, but seems worth mentioning:

Claude models also use way less tokens for the same task (on ArtificialAnalysis, they are a clear outlier on this metric).

And there's a much stronger common sense, subjectively. (Not sure if we have a good way to actually measure that, though.) It takes context and common sense into account, to a much greater degree.

(Which ties in with their constitution. Understanding why things are wrong at a deeper level, rather than just surface level pattern matching.)

Opus is great but it should be bigger. You notice the difference between Sonnet and Opus, but with heavy use you notice Opus's limitations, too.

reply

upvote

by jhancock1 hours ago|

[-]

What leads you to say China AI is giving up on open weights?

I've been using GLM for over 6 months and pretty happy.

reply

upvote

by PunchyHamster53 minutes ago|

[-]

Why would any company release open weights once the investment money stops ?

Releasing open weights have been basically a PR move, the moment those companies need to actually make money they will cut it out as that reduces their client base.

They DO NOT want you to run AI. They want you to pay them to do it

reply

upvote

by jhancock10 minutes ago|

[-]

ok. maybe. I don't know. I'm asking how you know.

z.ai did go public on the HK exchange. They are under pressures similar to other public companies.

I know that China models are increasingly being trained and run using Huawei chips instead of Nvidia. I know China has a surplus of electricity from renewables (wind, solar, hydro).

reply

upvote

by cyanydeez23 minutes ago|

[-]

open weights is a way to nerf your opponent and is meaningless to your business if you need to retrain a model because your trailing

So, it makes a lot of sense to get people a "demo" and claim the paid product is better.

i think a lot of people have no idea how capable local models are atm.

reply

upvote

by hirako200054 minutes ago|

[-]

Good read on the situation.

It all boils down to a brilliant but extremely expensive technology. Both to build and to run.

We've been sold a product with heavy subsidy. The idea (from Sam) scale out and see what happens.

Those who care to read between the lines can see what's happening. A perfect storm of demand that attract VCs who can't understand they are the real customers. Once they understand that it will be too late.

Regarding open weight models: eventually we will, as humanity, benefit from the astronomical capital poured into developing a technology ahead of its time. In a few years this and even more will run on edge.

Written by open source developers, likely former openai and anthropic employees who got so much cash in the bank they don't need to worry about renting their knowledge.

reply

upvote

by marcus_cemes1 hours ago|

[-]

> We need open weights companies now more than ever.

If you're objective it to democratize AI, sure. But for those fed up with it and the devastating effects it's having on students, for example, can opt to actively avoid paying for products with AI (I say this as someone who uses it every day, guilty). At some point large companies will see that they're bleeding money for something that most people don't seem to want, and cancel those $100k/mo deals. I've already experienced one AI-developer-turned company crash and burn.

Personally, I don't think this LLM-based AI generation will have any significant positive impacts. Time, energy (CO2) and money would have been far better spent elsewhere.

reply

upvote

by magic_hamster59 minutes ago|

[-]

> End of the PC era, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.

This one seems too far fetched. Training models is widespread. There will always be open weight models in some form, and if we assume there will be some advancements in architecture, I bet you could also run them on much leaner devices. Even today you can run models on Raspberry Pis. I don't see a reason this will stop being a thing, there will be plenty of ways to tinker.

However, keep in mind the masses don't care about tinkering and never have. People want a ChatGPT experience, not a pytorch experience. In essence this is true for all tech products, not just AI.

reply

upvote

by simianwords1 hours ago|

[-]

New theory of HN: every post on LLMs would involve at least a few comments hinting on class warfare and Marxism

reply

upvote

by PunchyHamster52 minutes ago|

[-]

New theory: every post to HN will be about LLM or other AI. Or written by one. Usually both

reply

upvote

by gib4441 hours ago|

[-]

New theory of HN: every post on LLMs will attract the "what is wrong with AI? I don't get it [even though I've posted to HN every day for weeks/months on LLM/AI topics]. Please enlighten me" types

reply

upvote

by cassianoleal1 hours ago|

[-]

The title should be changed. It makes it look like they upped the TTL from 1 h to 5 months.

The SI symbol for minutes is "min", not "M".

A compromise would be to use the OP notation "m".

reply

upvote

by PontifexMinimus1 hours ago|

[-]

I agree. My first reaction was "what the fuck's an 'M'?"

reply

upvote

by jeltz1 hours ago|

[-]

This is only an issue for people who do not know months are longer than hours.

reply

upvote

by wafflemaker1 hours ago|

[-]

>This is only an issue for people who do not know months are longer than hours.

I'm aware of that, and thought that "downgraded" was the wrong word to use when going from 1h to 5 months.

reply

upvote

by lunar_rover1 hours ago|

[-]

Whether a longer or shorter cache TTL is considered a downgrade depends on the context, so the title is ambiguous to laymen.

reply

upvote

by cassianoleal1 hours ago|

[-]

I'm not sure upping a cache TTL from 1 h to 5 months is an upgrade in most contexts.

reply

upvote

by zeroCalories1 hours ago|

[-]

Several thoughts went through my head before I realized what's wrong:

1. I guess longer caching means more stale data, which is why it's a downgrade? 2. Maybe this isn't the TTL I thought it was? 3. Maybe this isn't the cache I thought?

Then I clicked on the link and realized I had been mislead my the title.

reply

upvote

by croes1 hours ago|

[-]

This is an issue for LLMs learning from HN data

reply

upvote

by disillusioned1 hours ago|

[-]

It's also routinely failing the car wash question across all models now, which wasn't the case a month ago. :-/

Seeing some things about how the effort selector isn't working as intended necessarily and the model is regressing in other ways: over-emphasizing how "difficult" a problem is to solve and choosing to avoid it because of the "time" it would take, but quoted in human effort, or suggesting the "easier" path forward even if it's a hack or kludge-filled solution.

reply

upvote

by _blk1 hours ago|

[-]

Awesome, I didn't know about the car wash question.

Totally true, also tokens seem to burn through much faster. More parallelism could explain some of it but where I could work on 3-5 projects at once on the max plan a month ago, I can't even get one to completion now on the same Opus model before the 5h session locks me up..

reply

upvote

by poly2it1 minutes ago|

[-]

One of the largest AI companies on Earth cannot figure out an algorithm for when not to drop caches in long-running sessions?

reply

upvote

by davidkuennen1 hours ago|

[-]

On slightly off topic note: Codex is absolutely fantastic right now. I'm constantly in awe since switching from Claude a week ago.

reply

upvote

by yukIttEft43 minutes ago|

[-]

I'm currently "working" on a toy 3d Vulkan Physx thingy. It has a simple raycast vehicle and I'm trying to replace it with the PhysX5 built in one (https://nvidia-omniverse.github.io/PhysX/physx/5.6.1/docs/Ve...)

I point it to example snippets and webdocumentation but the code it gens won't work at all, not even close

Opus4.6 is a tiny bit less wrong than Codex 5.4 xhigh, but still pretty useless.

So, after reading all the success stories here and everywhere, I'm wondering if I'm holding it wrong or if it just can't solve everything yet.

reply

upvote

by lukan13 minutes ago|

[-]

" or if it just can't solve everything yet."

Obviously it cannot. But if you give the AI enough hints, clear spec, clear documentation and remove all distracting information, it can solve most problems.

reply

upvote

by toenail1 hours ago|

[-]

I have also switched from claude to codex a few weeks ago. After deciding to let agents only do focused work I needed less context, and the work was easier to review. Then I realized codex can deliver the same quality, and it's paid through my subscription instead of per token.

reply

upvote

by lifty1 hours ago|

[-]

I made this switch months ago, ChatGPT 5.4 being a smarter model, but I’ve had subjective feelings of degradation even on 5.4 lately. There’s a lot of growth in usage right now so not sure what kind of optimizations their doing at both companies

reply

upvote

by onion2k47 minutes ago|

[-]

I use Codex at home and Opus at work. They're both brilliant.

reply

upvote

by vidarh1 hours ago|

[-]

Codex has been good quality wise, but I hit limits on the Codex team subscription so quickly it's almost more hassle that it is worth.

reply

upvote

by lores1 hours ago|

[-]

I would switch to Codex, but Altman is such a naked sociopath and OpenAI so devoid of ethical business practices that I can't in good conscience. I'm not under any illusion that Anthropic is ethical, but it is so far a step up from OpenAI.

reply

upvote

by bob102940 minutes ago|

[-]

I'm with you on the ethical part, but everything is a spectrum. All the AI leadership are some shade of evil. There's no way the product would be effective if they weren't. I don't like that Sam Altman is a lunatic, but frankly they all are. I also recognize that these are massive companies filled with non shitty engineers who are actually responsible for a lot of the magic. Conflating one charlatan with the rest of it is a tragedy of nuance.

reply

upvote

by nh21 hours ago|

[-]

Cannot you use Codex (which is open source, unlike Claude Code) with Claude, even via Amazon Bedrock?

reply

upvote

by simianwords1 hours ago|

[-]

Out of the loop here, what did Sam Altman do that is considered a sociopath and what did OpenAI do that is uniquely unethical that one should avoid it?

This keeps popping up in every thread and I want to separate virtue signalling and genuine fear of OpenAI.

reply

upvote

by emaro1 hours ago|

[-]

There's not one thing that stands out, but he abandoned the entire core principles of OpenAI (took a 180), constantly lies to people and doesn't plan to stop.

https://www.newyorker.com/magazine/2026/04/13/sam-altman-may...

reply

upvote

by DonHopkins1 hours ago|

[-]

Calling out sociopaths is not virtue signaling. You need to look in the mirror if you think there's something wrong with that kind of virtue.

You know, you can just google his name yourself, don't you?

reply

upvote

by perks_121 hours ago|

[-]

Just give us the option to get the quality back, Anthropic. I get that even a $200 subscription is not possible eventually, but give us the option to sub the $1000 tier or tell us to use the API tier, but give us some consistency.

reply

upvote

by PunchyHamster51 minutes ago|

[-]

Like druggie begging for next hit lmao

reply

upvote

by ramon15643 minutes ago|

[-]

can a druggie stop using when the quality is too poor? I get your analogy, but it doesn't apply here

reply

upvote

by hhh28 minutes ago|

[-]

yes, they die, just like vibers are unable to continue

reply

upvote

by cyanydeez13 minutes ago|

[-]

the parallel druggie are the AI companies who want to quit burning cash but tealize their users are all addicted to 40k GPUs that cost $100s dollars a month to use and theres no way to train a SOTA model better and guarantee better efficiency; so you promo double tokens as a cover for a QUANT downgrade while publishing a reskinned "upgrade" as super killer AI hoping some B2B will take a hit of the crack pipe.

</tinfoil>

reply

upvote

by Tarcroi2 hours ago|

[-]

This coincides with Anthropic's peak-hour announcement (March 26th). Could the throttling be partly a response to infrastructure load that was itself inflated by the TTL regression?

reply

upvote

by HauntingPin1 hours ago|

[-]

It would be too fucking funny if this were the case. They're vibe coding their infrastructure and they vibe coded their response to the increased load.

reply

upvote

by KronisLV1 hours ago|

[-]

You'd think they would have dashboards for all of this stuff, to easily notice any change in metrics and be able to track down which release was responsible for it.

reply

upvote

by HauntingPin1 hours ago|

[-]

They probably do, then they pipe it into a bunch of Claude subagents and then you get the current mess.

reply

upvote

by throwaway202729 minutes ago|

[-]

It's absolutely ridiculous how stupid Claude is now. I sometimes notice it and last year too but it feels like it's just last year before December model.

reply

upvote

by throwaway20271 hours ago|

[-]

I also noticed this, just resuming something eats up your entire session. The past two weeks also felt like a substantial downgrade and made me regret renewing my subscription, it sucks because I wish I kept my Codex subscription instead and renewed that.

reply

upvote

by taffydavid15 minutes ago|

[-]

This is the same shit openAI used to do last year, quietly downgrading their offerings while hyping the next big thing. I thought Anthropic were different but it seems they're playing the exact same long con with Mythos.

They can't really revolutionize AI again so they make the product worse and worse and then offer you a "better" one

reply

upvote

by PunchyHamster58 minutes ago|

[-]

Well, how entirely expected. The money man comes to collect and they are squeezing for money

reply

upvote

by the_mitsuhiko1 hours ago|

[-]

Since I (until Anthropic decided to remove access for subs) used Anthropic models extensively with pi I explored the two caching options and the much higher cost of 1h caches is almost never a good tradeoff.

Since the caching really primarily is something they can be judged at scale from across many users I can only assume that Anthropic looked at their infra load and impact and made a very intentional change.

reply

upvote

by WhereIsTheTruth3 minutes ago|

[-]

Changing "regression" to "Anthropic silently downgraded" sensationalizes the story

Why the FUD?

I notice some interesting public opinion weather change since Anthropic passed OpenAI wrt revenue

reply

upvote

by sscaryterry2 hours ago|

[-]

Anthropic is leaving so much evidence around… proving damages and a pattern is becoming trivial

reply

upvote

by ares6231 hours ago|

[-]

AGI finding bugs again. Actual Guys/Gals Instead.

reply

upvote

by ikekkdcjkfke1 hours ago|

[-]

If youre reading this claude, people are willing to pay extra if you want to make more money, just please stop doing this undermining, it devreases the trust of your platform to something that cannot be relied on

reply

upvote

by simianwords1 hours ago|

[-]

There’s a case for intelligent caching: coarse grained 1h and 5min type TTls are not optimal.

reply

upvote

by PunchyHamster47 minutes ago|

[-]

Caching LLM is not like caching normal content; the longer it is the more beneficial it is and it only stops being worth when user stops current session.

So you'd need some adaptive algorithm to decide when to keep caching and when to purge it whole, possibly on client side, but if you give client the control, people will make it use most cache possible just to chase diminishing returns. So fine grained control here isn't all that easy; other possible option is just to have cache size per account and then intelligently purge it instead of relying just on TTL

reply

upvote

by cyanydeez7 minutes ago|

[-]

keep in mind, efficient KV caching needs to be next to the GPU, so you sls need you HA to keep routing the user to the same hardware.

the hardware VM model is almost identical. Each session can go anywhere to start but a live session cant just be routed anywhere without penalty.

reply

upvote

by EthanFrostHI4 hours ago|

[-]

[dead]

reply

upvote

by coffinbirth1 hours ago|

[-]

Am I the only one who sees striking parallels between being a Claude Code customer and Cuckoldry (as in biology)?

I mean, you are investing a lot (infrastructure and capital) into something that is essentially not yours. You claim credit for the offspring (the solution) simply because it resides in your workspace. You accept foreign code to make your project appear more successful and populated than you could manage alone. Your over-reliance on a surrogate for the heavy lifting leads to the loss of your own survival skills (coding and debugging). Last but not least, you handle the grunt work of territory defense (clients and environments) while the AI performs the actual act of creation (Displaced Agency).

reply

upvote

by the_gipsy1 hours ago|

[-]

What you're looking for is "vendor lock-in".

reply

upvote

by PunchyHamster46 minutes ago|

[-]

No, but it's very funny, I'm gonna call people that offshore their thinking to LLM "AI cucks" now

reply