undefined

upvote

points

by embedding-shape8 hours ago |

upvote

by vbezhenar8 hours ago|

[-]

> does the users who use Anthropic switch over to those even if they're available even as hosted models?

I'm currently spending $200 for Claude. That's around my maximum that I can afford. I could stretch that to $500 I guess. But I saw reports of people spending tens of thousands of dollars with Claude API. That's certainly outside of my budget.

So if/when Anthropic decides to stop subsidizing subscription (if they ever do that thing, I still not sure about that), I'll certainly look at the other options. And available "open weights" LLMs hosted by someone will be my first pick. Right now Claude 4.8 feels very advanced, but things move very fast...

reply

upvote

by 7 hours ago|

[-]

deleted

reply

upvote

by HDThoreaun7 hours ago|

[-]

The ai labs would be very dumb to get rid of subscriptions. First, I don’t even think the subscriptions are losing money, I suspect they’re around break even, maybe small loses. More importantly, the subscriptions are how they lock in users and convince companies to pay api rates. Without user loyalty that they cultivate with subscriptions businesses will just use the cheapest model on open router or maybe local models.

reply

upvote

by dominotw3 hours ago|

[-]

> I don’t even think the subscriptions are losing money, I suspect they’re around break even, maybe small loses

whats the basis for this thought

reply

upvote

by vineyardmike3 hours ago|

[-]

For Claude specifically, (1) enterprises pay API rates on top of subscriptions, so subscriptions profitability questions are only relevant for smaller companies and indie devs. Many of whom probably have sporadic or low usage which helps balance some heavy users.

Again, for Claude, (2) it’s rumored that their API rates have around a 90% profit margin. It’s also claimed that the subscription limits get you around 10x tokens per monthly dollar vs buying them with API rates.

Edit: to drive it home. If a tokens true cost to anthropic is 1/10 of what they sell it for at API rates, and a subscription gets you tokens at 1/10 the price, that’s cost-neutral for the business if every subscription uses every token. They’re selling tokens at cost, not at a loss. Many subscription users won’t use their full allotment. That means serving some users doesn’t cost the business as much - which might push the subscription business from cost neutral to profitable.

reply

upvote

by dominotw3 hours ago|

[-]

not sure how that concludes that subscriptions are not losing money.

reply

upvote

by epolanski2 hours ago|

[-]

What's the basis for the opposite?

reply

upvote

by HDThoreaun3 hours ago|

[-]

I think theyre charging at least 3x marginal cost for the api and I think that the average subscription uses around half its token allotment. So subscription needs to give 6x as many tokens as the api would cost for it to break even for the lab and that's around where they tend to sit

reply

upvote

by FuriouslyAdrift7 hours ago|

[-]

The hotness we are seeing is smaller 'expert' models with an 'orchestrator' model in front that evaulates the prompts and routes to the appropiate small models and then synthesizes the collected answer. Easier to split across many smaller, cheaper servers and more efficient than a huge monolithic model.

reply

upvote

by losvedir7 hours ago|

[-]

Do you have more info about this? I can't tell if you're being misled by the unfortunate "Mixture of Experts" terminology (which don't work the way you're describing), or alluding to something different.

Or, maybe I'm wrong, but my understanding is: MoE is just an architecture to keep the activated weights smaller per token. The experts get routed basically token-by-token, and the "experts" themselves don't have a semantic domain so the "expert" word was maybe a poor choice.

reply

upvote

by everforward5 hours ago|

[-]

No, this is an agent-level thing, not a feature of the model (ish, unsure for Fable).

You talk to a smart, heavy model to build a plan composed of smaller steps. Then you have the heavy model spin up smaller, cheaper LLMs to actually implement the tasks.

The heavy model is basically read-only in that mode. It can read files, execute tests, etc, but it can’t write code. It just tracks what needs to be done, offloads the work to dumber LLMs, validates the task is done, and moves on to the next step.

It sort of pushes humans up the stack. Instead of having a human sitting there prompting the LLM to start the next task, you have another LLM do that loop.

It’s been on my list to try out.

reply

upvote

by thesz6 hours ago|

[-]

https://en.wikipedia.org/wiki/Mixture_of_experts#Sparsely-ga...

"The sparsely-gated MoE layer,[21] published by researchers from Google Brain, uses feedforward networks as experts, and linear-softmax gating. Similar to the previously proposed hard MoE, they achieve sparsity by a weighted sum of only the top-k experts, instead of the weighted sum of all of them."

"Top-k experts," in case of some DeepSeek's models k=1.

reply

upvote

by bugglebeetle5 hours ago|

[-]

See OpenRouter’s recent announcement on a model fusion setup, which they now support via API:

https://openrouter.ai/blog/announcements/fusion-beats-fronti...

reply

upvote

by xboxnolifes2 hours ago|

[-]

People dont pivot on a dime. If there stopped being major model improvements for a few years and equivalent free models have been out during the same period, we will see people slowly move over to competitors.

reply

upvote

by ForHackernews8 hours ago|

[-]

> Anthropic and Claude remains very popular among the people who use LLMs

Only because someone else is paying the bills. I use Claude Opus at work because my employer pays for the tokens and encourages me to do it.

At home, I use DeepSeek Flash. It's not as good, but it's maybe 0.7 quality for 0.001 cost.

reply

upvote

by LaurensBER7 hours ago|

[-]

Same, I had Deepseek search for, download and transfer (to my Linux emulation machine) the best Dreamcast games yesterday.

GPT refused to do so (citing that it's illegal even though I own the games). Deepseek did a wonderful job for 7 cents.

At work I use Opus because, why not? But I could easily switch to a less capable model if needed.

reply

upvote

by JCTheDenthog5 hours ago|

[-]

>citing that it's illegal even though I own the games

In the. US at least it is actually illegal to download ISOs/roms of games, even if you own a physical copy. It's a stupid law and as a downloader (as opposed to the people hosting the files) your chances of getting into any kind of actual legal trouble are effectively 0, but it is still against the law.

reply

upvote

by jtbayly1 hours ago|

[-]

I don’t think so. I’d want more than just your word on it.

reply

upvote

by JCTheDenthog14 minutes ago|

[-]

https://www.tomshardware.com/news/why-most-roms-are-illegal,...

https://answers.justia.com/question/2025/08/04/is-downloadin...

https://www.howtogeek.com/262758/is-downloading-retro-video-...

etc. etc. etc. etc.

reply

upvote

by mark_l_watson8 hours ago|

[-]

I have a question that perhaps you or someone else here has an answer for: I enjoy using Opus via Google Antigravity (usually agy) for perhaps 90 minutes a week. For Google’s subsidized $20/month plan they seem to give out a reasonably generous amount of Claude tokens. How does this compare with Anthropic’s $20/month plan using Claude Code?

BTW, I also use DeepSeek v4 Flash very frequently: fast and so cheap it is almost free.

reply

upvote

by everforward5 hours ago|

[-]

It’s really hard to translate minutes to tokens, it depends on how you’re using it.

The best answer would be to pull session stats from your harness and compare that against the limits. I think Anthropic publishes the limits of each plan.

If you’re using a pretty stock harness and not doing crazy multi-agent stuff with it, you’re probably fine.

My girlfriend built a whole (but simple) React app with it and only hit the limits of the $20 plan once. In fairness, she was trying to get it to clean up a bunch of 800ish line React files at once with a vague “make it look nice” prompt that she ran a few times. I think it was just churning for like half an hour straight before she burned all her credits.

It’s probably enough if you’re not on a fully agentic development strategy, it’s plenty to have it write tests and do comments and stuff, just not enough to continually have it doing giant refactoring passes.

reply

upvote

by trollbridge5 hours ago|

[-]

Anthropic's plans are based on user experience of usage, not raw token counts, so you get to run through so many conversation turns, etc. within a 5 hour usage window. (Cursor, OpenCode Go, and others are similar.)

Cursor's $20 a month plan provides a reasonable amount of Opus tokens as well.

reply

upvote

by okdood646 hours ago|

[-]

What's the speed on DeepSeek Flash? And what provider?

reply

upvote

by ForHackernews6 hours ago|

[-]

Fast enough? I signed up directly with https://platform.deepseek.com/ because it was the cheapest I could find. I use both Anthropic and Deepseek models via the VS Code copilot plugin https://github.com/Vizards/deepseek-v4-for-copilot

reply

upvote

by halJordan7 hours ago|

[-]

I don't think you're appropriately understanding the full gamut. The individuals who only spent $200/months will be stuck. But the pie is increasing in size, it's not stagnant. There are a lot of orgs who can afford to run a 1T model and even more that can run a 600B model. These newcomers are what's being fought over

reply