undefined

upvote

points

by solenoid093710 hours ago |

upvote

by someotherperson10 hours ago|

[-]

Competition is bad? Who cares - let the big players subsidize and compete between each other. That's what we want. We want strong models at a low price, and we'll hype up whoever is doing it.

Simultaneously, we also hype up the open models that are catching up. That are significantly more discounted, that also put pressure on the big players and keep them in check.

People aren't falling for PR; people are encouraging the PR to put pressure on the competition. It's not that hard.

reply

upvote

by frank_nitti10 hours ago|

[-]

Interesting to see your observation where I have observed the opposite: posts that share big news about open-weight local models have many upvoted comments arguing local models shouldn’t be taken seriously and promoting the SOTA commercial models as the only viable options for serious developers.

Here and on AI tech subreddits (ones that aren’t specifically about local or FOSS) seem to have this dynamic, to the degree I’ve suspected astroturfing.

So it’s refreshing to see maybe that’s just a coincidence or confirmation bias on my end.

reply

upvote

by lxgr9 hours ago|

[-]

Both can be true at the same time. I currently wouldn't waste my time with open models for almost all use cases, but they're crucial from a data privacy and competitive perspective, and I can't wait for them to catch up enough to be as useful as the current frontier models.

reply

upvote

by organsnyder9 hours ago|

[-]

I've found qwen3 to be very usable on my local machine (a Framework Desktop with 128gb RAM). I doubt it could handle the complex tasks I throw at Claude Opus at work, but it's more than capable of doing a surprising number of tasks, with good performance.

reply

upvote

by dotancohen8 hours ago|

[-]

What tasks do you use qwen3 for? Coding? Are you running it on CPU or GPU? What GPU does that Framework have?

Thanks!

reply

upvote

by girvo8 hours ago|

[-]

I have an Asus GX10 that I run Qwen3.5 122B A10B on, and I use it for coding through the Pi coding agent (and my own); I have to put more work in to ensure that the model verifies what it does, but if you do so its quite capable.

It makes using my Claude Pro sub actually feasible: write a plan with it, pick it up with my local model and implement it, now I'm not running out of tokens haha.

Is it worth it from a unit economics POV? Probably not, but I bought this thing to learn how to deploy and serve models with vLLM and SGLang, and to learn how to fine tune and train models with the 128GB of memory it gets to work with. Adding up two 40GB vectors in CUDA was quite fun :)

I also use Z.ai's Lite plan for the moment for GLM-5.1 which is very capable in my experience.

I was using Alibaba's Lite Coding Plan... but they killed it entirely after two months haha, too cheap obviously. Or all the *claw users killed it.

reply

upvote

by jeremyjh5 hours ago|

[-]

GLM 5.1 is extremely good, and ridiculously cheap on their coding plan. Its far better than Sonnet, and a fifth of the cost at API rates. I don't know if the American providers can compete long-term; what good is it to be more innovative it only buys them a six month lead andthey can't build the data center capacity fast enough for demand? Chinese providers have a huge advantage in electrical grid capacity.

reply

upvote

by girvo4 hours ago|

[-]

True but Z.ai also just silently raised the price, and the entire Chinese frontier set is having to make profit now... hence Alibaba killing the Lite plan and not letting people sign up to their Pro one either; and why MiniMax has their non-commercial license, etc. etc.

So I agree with you, its better than Sonnet but way cheaper. I do wonder how long that will last though

reply

upvote

by fragmede2 hours ago|

[-]

Z.ai does really well at the carwash question!

reply

upvote

by dotancohen8 hours ago|

[-]

Thank you. I've been using ollama for a much more modest local inference system. I'll research some of the things you've mentioned.

reply

upvote

by botanrice6 hours ago|

[-]

[dead]

reply

upvote

by organsnyder8 hours ago|

[-]

The Framework Desktop has a Ryzen 395 chip that is able to allocate memory to either the CPU or GPU. I've been able to allocate 100+gb to the GPU, so even big models can run there.

Most recently I used it to develop a script to help me manage email. The implementation included interacting with my provider over JMAP, taking various actions, and implementing an automated unsubscribe flow. It was greenfield, and quite trivial compared to the codebases I normally interact with, but it was definitely useful.

reply

upvote

by dotancohen8 hours ago|

[-]

That's great. Ostensibly my system could also allocate some of the 32 GB of system memory to argument the 12 GB VRAM, but I've not been able to get it to load models over 20B. I should spend some more time on it.

reply

upvote

by bloppe9 hours ago|

[-]

I'm just waiting till I can afford a GPU again

reply

upvote

by nl5 hours ago|

[-]

I've invested significant time into getting open models to work, and investigating what works well.

The TL;DR is that unless you are doing it as a hobby or working in an environment where none of the data privacy options supported by Anthropic/OpenAI (including running on Azure/Bedrock with ZDR) work for you then it's not worth it.

The best open models are around the Sonnet 4.6 level. That's excellent, but the level of tasks you can give to GPT 5.4 or Opus 4.6 is just so much higher it doesn't compare (and Opus 4.7 seems noticeably better in my few hours of testing too).

I have my own benchmarks, but I like this much under-publicized OpenHands page: https://index.openhands.dev/home

It shows for every task they test closed models do the best. The closest and open model gets is Minmax 2.7 on issue resolution where it's ~1% worse than the leaders.

That matches my experience - fine for small problems, but well behind has the task gets bigger.

reply

upvote

by 10 hours ago|

[-]

deleted

reply

upvote

by echelon5 hours ago|

[-]

> Interesting to see your observation where I have observed the opposite: posts that share big news about open-weight local models have many upvoted comments arguing local models shouldn’t be taken seriously and promoting the SOTA commercial models as the only viable options for serious developers.

When I argue this, my point is that FOSS shouldn't target the desktop with open weights - it should target H200s. Really big parameter models with big VRAM requirements.

Those can always be distilled down, but you can't really go the other way.

reply

upvote

by whymememe10 hours ago|

[-]

I agree but I’d like to add that people are definitely falling for PR, people are always falling for PR or no one would bother with PR

reply

upvote

by dmix2 hours ago|

[-]

This assumes people are in touch with reality and aren't just motivated by vibes and insta-reactions on social media

reply

upvote

by daveguy9 hours ago|

[-]

> Competition is bad? Who cares - let the big players subsidize and compete between each other.

Subsidizing is the opposite of competing. It's literally the practice of underpricing your product to box out competition. If everyone was competing on a level playing field they would all price their products above cost.

All these tech oligarch asshat companies need to be regulated to hell and back.

reply

upvote

by ipaddr9 hours ago|

[-]

The moat was already too large for smaller players. Let them subsidize. Take from investors and give to us buying me time to beef up my local stack to run local models.

For many things now you need to go local and in the future if you want any privacy you'll need to go local.

reply

upvote

by daveguy9 hours ago|

[-]

Excellent point, but I still think the oligarchs have gotten a little monopoly-happy.

reply

upvote

by agentifysh8 hours ago|

[-]

What's the alternative, move to North Korea ?

reply

upvote

by daveguy7 hours ago|

[-]

Well, that's a great big wtf out of left field.

reply

upvote

by 7 hours ago|

[-]

deleted

reply

upvote

by watwut10 hours ago|

[-]

Big players subsidizing is what kills medium and small players which then kills competition. What follows is monopoly.

Big players operating at loss to distort the market is not a good thing overall.

reply

upvote

by someotherperson9 hours ago|

[-]

The medium and small players are literally just distilling the larger models.

It's not the smaller players spending billions on training data.

reply

upvote

by sofixa3 hours ago|

[-]

No, the medium and small players are the Mistals, DeepSeek and H Company of the world, with their own models using quirky optimisation techniques to be able to compete.

reply

upvote

by badrequest3 hours ago|

[-]

It's hilarious how much this post reads as drafted by an LLM. The emdash, "it's not X, it's Y" framing, incredible.

reply

upvote

by someotherperson2 hours ago|

[-]

I wrote my post myself.

reply

upvote

by sph3 hours ago|

[-]

Dogfooding by the slop factory. The artificial centipede.

reply

upvote

by the__alchemist9 hours ago|

[-]

Call it fall for it, but here are my two experiences, with both applications open. ($20/month plan for both)

  - Claude: Good for ~20 minutes of work once every 4 hours
  - Codex: Good for however long I want to use it.

Claude nerfed their product so that it's not usable, so I use something else.

reply

upvote

by CrazyStat9 hours ago|

[-]

Since we’re sharing anecdata: I also have the $20 month plan for codex, and I hit the five hour limit after about an hour of work every single time I open it. I use it for personal side projects primarily in the evening after kids are in bed, so my strategy is to launch it about 4pm and send a simple prompt to prime the 5 hour window to end at 9pm, start working about 8pm, and then I can use up the existing 5 hour window and the next one by about 10pm.

reply

upvote

by botanrice6 hours ago|

[-]

What kind of side projects do you need to run these models for that many hours? I haven't experimented with Opus to that extent and mostly supervise it and/or am prompting it every 5-10min to fix something up.

reply

upvote

by CrazyStat5 hours ago|

[-]

I've done a variety of things with it:

- sysadmin tasks for my home server which runs home assistant, plex, and minecraft servers. Being able to tell it "Set up a minecraft fabric server with this list of mods" is pretty nice, and it's fairly competent at putting together home assistant dashboards and automations (make sure you have backups of anything it's allowed to touch, though--it may delete stuff without warning).

- Several small web apps primarily for my own use.

- Currently working on an opinionated desktop writing app for my own use.

reply

upvote

by KronisLV9 hours ago|

[-]

I'm on the 100 USD plan with Anthropic, I hit the 5 hour limits about 75% of the time during working hours, but almost never the weekly ones - by the time they're reset I've usually used up between 50% - 75% of the quota. There are periods of more intense usage ofc, but this is the approx. situation I'm in (also it doesn't work on tasks while I'm asleep, because I occasionally like having a look at WIP stuff and intervene if needed).

The Anthropic 20 USD plan would more or less be a non-starter for agentic development, at least for the projects that I work on, even while only working on a single codebase or task at a time (I usually do 1-3 at a time).

I would be absolutely bankrupt if I had to pay per-token. That said, I do mostly just throw Opus at everything (though it sometimes picks Sonnet/Haiku for sub-agents for specific tasks, which is okay), so probably not a 100% optional approach, but I've wasted too much time and effort in the past on sub-optimal (non-SOTA) models anyways. I wonder which is closer to the actual cost and how much subsidizing there is going on.

reply

upvote

by bitmasher97 hours ago|

[-]

The $200 openai plan feels like 10x the limit as the $100 claude plan.

But Opus is both smarter and faster than GPT, so I can get a lot more done during the Claude limits.

reply

upvote

by lsdmtme4 hours ago|

[-]

for now... right now you are getting 2x usage as a promo

reply

upvote

by the__alchemist8 hours ago|

[-]

Concur, re the ratio of weekly vs hourly limits: I hit the hourly one much more often than weekly.

reply

upvote

by rachel_rig3 hours ago|

[-]

[dead]

reply

upvote

by ipaddr9 hours ago|

[-]

Wow the 20 dollar Claude plan sounds awful. I use Claude at work which has metered billing and have to carefully not to hit my four figure max cap.

For me $20 a month is more than I want to spend I just use the free tiers. If I use AI in an app or site I use older models mostly chatgpt3.5. The challenge is more fun and it means I can do more like, make more api calls - 100x more.

reply

upvote

by XDataY4 hours ago|

[-]

I use $20 plan for my side projects and in the beginning I was hitting limits very fast but after creating proper .md files and running /clear, it seems to work fine for my use. I am really curious how people are using $100-$200 plans. Maybe I am not utilizing to its full capacity??

reply

upvote

by dingnuts9 hours ago|

[-]

[dead]

reply

upvote

by BrokenCogs10 hours ago|

[-]

There's a systematic marketing campaign from oai on reddit and HN - there's a huge uptick of "codex is better than claude code" comments and posts this last week which is perfectly timed with the claude code increased limits

reply

upvote

by unsupp0rted10 hours ago|

[-]

Go to /r/codex and see how pissed off people are by the new Codex Plus plan 5-hour limits (they're a sliver of what they were a week ago). Whatever OpenAI is doing to market on Reddit isn't working.

reply

upvote

by toraway10 hours ago|

[-]

I'm not sure what changed or what the complaint is ... But personally, I have still never hit the rate limit on the $20/mo ChatGPT Plus plan, while I was constantly getting kicked off the Claude Pro plan until I got fed up and cancelled a few months ago.

reply

upvote

by unsupp0rted9 hours ago|

[-]

I can get about 20 ~ 40 minutes of my 5-hour limit using Codex 5.4 medium to say write a patch script in typescript for a Firebase + BigQuery app. That's including about 10 minutes of first writing a planning.md doc with 5.2 High.

A couple weeks ago I'd get roughly 2~3 hours. And a month before that I couldn't break the 5-hour limit.

reply

upvote

by CuriouslyC7 hours ago|

[-]

They were running a 2x rate limit promo last month.

reply

upvote

by CuriouslyC7 hours ago|

[-]

To be fair, GPT 5.4 is mostly a better model than Opus 4.6 in terms of quality of work. The tradeoff is it's less autonomous and it takes longer to complete equivalent tasks.

reply

upvote

by boomskats10 hours ago|

[-]

Thing is, Codex 5.3 is a better and more consistent model than anything Anthropic have come out with. It can deal with larger codebases, has compaction that works, and has much less of a tendency to resort to sycophantic hallucination as it runs out of ideas. I also appreciate their approach to third party harnesses like opencode, which is obviously the complete opposite to Anthropic and their scramble to keep their crumbling garden walls upright.

Which makes it even more of a shame that Sam Altman is such a psychopathic jackass.

reply

upvote

by luddit310 hours ago|

[-]

So Anthropic degraded their product. OAI updated their product to meet for exceeded Anthropic old product.

This is normal behavior and not a cause for such a hyperbolic response.

reply

upvote

by solenoid09378 hours ago|

[-]

There is good competition and bad competition.

Pricing your product unsustainably vs a competitor to gain market share is regarded as "bad competition" and has historically been seen as anticompetitive.

It does not benefit the consumer in the long run, because the goal is to use your increased funding or cash reserve to wipe your competition out of the market, decreasing competition in the long term.

Then, once your competition is gone, and you've entrenched yourself, you do a rug pull.

reply

upvote

by byzantinegene3 hours ago|

[-]

you're right but for now it doesn't matter if both competitors are running on infinite vc money, we as consumers benefit from it. it only matters if they cause negative externalities in the meantime

reply

upvote

by pizzly10 hours ago|

[-]

This is the benefits of competition in action

reply

upvote

by solenoid09378 hours ago|

[-]

To be clear, unsustainably hemorrhaging money to gain marketshare over a competitor is generally considered an anticompetitive practice.

reply

upvote

by toraway8 hours ago|

[-]

What if both competitors are doing it?

reply

upvote

by justapassenger5 hours ago|

[-]

It’s also THE playbook of the Silicon Valley.

reply

upvote

by guzfip5 hours ago|

[-]

Also why there’s so much enthusiasm for it on HN

reply

upvote

by kar11819 hours ago|

[-]

This is true. But Anthropic did us dirty most recently and so it’s their turn on the pitch fork. Sam will do us too. Just not yet.

reply

upvote

by zmmmmm5 hours ago|

[-]

It's one of the things I really dislike about providers hyping "inference time scaling" as a concept. Apart from being a blatant misnomer (there's nothing scalable about it), it's so transparently a dial they can manipulate to shape perception. If they want a model to seem more intelligent than it really is, just dial up the "thinking" and burn tokens. Then once you have people fooled, you can dial it down again. Everyone will assume its their own fault that their AI suddenly isn't working properly. And since it's almost entirely unmeasurable you can do it selectively for any given product you want to pitch for any period of time you like and then pull the rug.

We need to force them back into being providers of commodity services and hit this assumption they can mold things in real time on the head.

reply

upvote

by m3nu8 hours ago|

[-]

I have a feeling that Codex is also getting lower limits. Got this email just now. Basically they copy Claude's $100 tier.

> To help you go further with Codex, we’re introducing a new €114 Pro tier designed for longer, high-intensity sessions.

> At launch, this new tier includes a limited-time Codex usage boost, with up to 10x more Codex usage than Plus (typically 5x).

> As the Codex promotion on Plus winds down today, we’re rebalancing Plus usage to support more sessions across the week, rather than longer high-intensity sessions on a single day.

reply

upvote

by giancarlostoro7 hours ago|

[-]

They didnt just lower limits they keep messing with peoples local settings and I wish it would be called out drastically more because it could cause serious issues. A coding agents settings are a contract, even the default ones, if they worked for me for 9 months and now you are changing defaults on me, you shouldnt just force new defaults on me without warning, Claude can and will goof up hard if misconfigured.

reply

upvote

by chaos_emergent9 hours ago|

[-]

Thinking in counterfactuals, how would the hype around Codex would be different if it was organic and because they had built a genuinely good product? Asking as someone who genuinely loves Codex and has been in the OpenAI camp for months after buying a Claude Max plan from November to February.

reply

upvote

by peyton9 hours ago|

[-]

I haven’t noticed much hype around Codex. I have both and use Claude for broad work off my phone and Codex on my computer to clean up the mess. Crank reasoning to the highest setting for each. Claude is extremely unreliable for me, and Codex feels like more of a real tool. I’d say Codex has a bit of a learning curve. Nothing much has changed for me in the past month or two (whenever GPT 5.4 came out).

reply

upvote

by 9 hours ago|

[-]

deleted

reply

upvote

by AlexCoventry5 hours ago|

[-]

It's quite likely that OpenAI is running a significant PR campaign to compensate for the bad rep they earned by stepping in to meet the demands of the Trump administration, after Anthropic refused to assist the administration with mass domestic surveillance and development of lethal autonomous weapons. Presumably OpenAI didn't buy the podcast TBPN just because they like the guys.

https://paulgraham.com/submarine.html

reply

upvote

by ra5 hours ago|

[-]

Anthropic don't seem to know how to look after and keep customers.

reply

upvote

by keeganpoppen9 hours ago|

[-]

everyone seems to unconditionally love anthropic, but openai has always had the best models… it just requires a bit more effort on behalf of the user to actually leverage it.

reply

upvote

by raincole8 hours ago|

[-]

> because Anthropic lowered rate limits for individuals due to compute constraints

It's because they don't support OpenCode.

reply

upvote

by khacvy2 hours ago|

[-]

I really hate this kind of behavior. Yeah, Anthropic may do some bad things, I don't know, but we all see that Anthropic is always one step ahead of OpenAI. And just because Anthropic lowered rates for some people, people now start saying that Codex is way better than Claude Code / Claude Desktop.

reply

upvote

by jsemrau8 hours ago|

[-]

Codex is much worse than Anthropics model. My experience is that I burn 10x the tokens using Codex compared to Sonnet 4.6

reply

upvote

by yoyohello1310 hours ago|

[-]

There was brief consternation when OpenAI swooped in to snatch up those DoD contracts but then the next model released and all is forgiven.

reply

upvote

by olcay_9 hours ago|

[-]

Anthropic coming out to say they won't surveil Americans wasn't actually a positive for me. It meant they're okay with surveilling the rest of the world, which in turn signaled "fuck you, you're inferior, deal with it" to me (as someone from the aforementioned rest of the world).

When OpenAI snatched those contracts, it made me think no worse of OpenAI. The surveillance was already factored into how I saw them (both).

reply

upvote

by HWR_148 hours ago|

[-]

And hopefully Anthropic has extra capacity then and I can return there.

reply

upvote

by a34729t4 hours ago|

[-]

Uber, but AI!

reply

upvote

by iterateoften4 hours ago|

[-]

No it’s because Anthropic can’t message anything to its customers without lying.

reply

upvote

by greenavocado10 hours ago|

[-]

Not only that, but anthropic is now forcing users to give their biometric information to palantir

They're doing a slow rollout

reply

upvote

by solenoid09378 hours ago|

[-]

OAI already requires this. They both require identity verification in some cases

reply