upvote
This has got to be bait..

1) OpenAI and Anthropic are killing it, and continue to do so, their coding tools are unmatched for professionals.

2) Local models don't hold a candle to SOTA models and there's nothing on the horizon that indicates that consumers will be able to run anything close to what you can get in a data center.

3) Coding is a killer product, OpenAI and Anthropic are raking in the cash. The top 3 apps are apps in the app store are AI. Everyone who knows anything is using AI, every day, across the economy.

reply
The grandparent is definitely wrong on (3). Yes, coding is a killer product, I agree with you.

On (2), I agree with you for local models. BUT, there are also the open source Chinese models accessible via open-router. Your argument ("don't hold a candle to SOTA models") does not hold if the comparison is between those.

On (1), I agree more with the grandparent than with your assessment. Yes, OpenAI and Anthropic are killing it for now, but the time horizon is very short. I use codex and claude daily, but it's also clear to me that open source is catching up quickly, both w.r.t. the models and the agentic harnesses.

reply
Open models are good but if you need a $10k GPU to run them then 99% of people are better of subscribing to OAI or CC.

Nowadays I also feel model performance matters less than the design of the tool harness, inference speed, and the other systems that surround a typical coding model.

reply
>BUT, there are also the open source Chinese models accessible via open-router.

I thought so myself, but after burning a lot of money on OpenRouter in a few days I just subscribed to Z.ai's Coding Pro plan and using the subscription is much, much friendlier with my wallet.

reply
> the open source Chinese models accessible via open-router

And? They aren't as good as SOTA models. Even the SOTA model provider's small models aren't worth using for many of my coding tasks.

reply
In my limited experience with it, GLM 5.1 is on par with Opus 4.6.
reply
I used GLM5 quite a bit, and I'd say it was maybe on par with Sonnet for most simple to medium tasks. Definitely not Opus though. Didn't test super long context tasks, and that's where I would expect it to break down. A recent study on software maintainability still showed Sonnet and Opus were peerless on that metric, although GLM series of models has been making impressive gains.
reply
I don't want to respond to 100 comments about the same thing, and this one happens to be on top, so, in my humble opinion:

(1): You don't have to be an Ed Zitron disciple to infer that OpenAI and Anthropic are likely overvalued and that Nvidia is selling everyone shovels in a gold rush. AI is a game-changing technology, but a shitty chat interface does not a company make. OpenAI and Anthropic need to recoup astronomical costs used in training these models. Models that are now being distilled[1] and are quickly becoming commoditized. (And frankly, models that were trained by torrenting copyrighted data[2], anyway.) Many have been calling this out for years: the model cannot be your product. And to be clear, OpenAI/Anthropic most definitely know this: that's why they've been aquihiring like crazy, trying to find that one team that will make the thing.

(2): Token prices are significantly subsidized and anyone that does any serious work with AI can tell you this. Go use an almost-SOTA model (a big Deepseek or Qwen model) offered by many bare-metal providers and you'll see what "true" token prices should look like. The end-state here is likely some models running locally and some running in the cloud. But the current state of OpenClaw token-vomit on top of Claude is fiscally untenable (in fact, this is why Anthropic shut it down).

(3): This is typical Dropbox HN snark[3], of which I am also often guilty of. I really don't think AI coding is a killer product and this seems very myopic—engineers are an extreme minority. Imo, the closest we've seen to something revolutionary is OpenClaw, but it's janky, hard to set up, full of vulnerabilities, and you need to buy a separate computer. But there's certainly a spark there. (And that's personally the vertical I'm focusing on.)

[1] https://www.anthropic.com/news/detecting-and-preventing-dist...

[2] https://media.npr.org/assets/artslife/arts/2025/complaint.pd...

[3] https://news.ycombinator.com/item?id=9224

reply
> And to be clear, OpenAI/Anthropic most definitely know this: that's why they've been aquihiring like crazy, trying to find that one team that will make the thing.

Anthropic is up to $30B annual recurring revenue. I wish I had failing business models like that.

> Token prices are significantly subsidized and anyone that does any serious work with AI can tell you this. Go use an almost-SOTA model (a big Deepseek or Qwen model) offered by many bare-metal providers and you'll see what "true" token prices should look like.

I'm not sure what think you are saying here, but if you look at the providers for both "almost-SOTA model (a big Deepseek or Qwen model)" or at the price for Claude on AWS Bedrock, Azure or on GCP you will quickly see inference is very profitable.

reply
> Anthropic is up to $30B annual recurring revenue. I wish I had failing business models like that.

And profit? A company can have $300B annual revenue, and still be a failing business if it's making a loss.

Somewhere along the line we seem to have forgotten this basic fact. Eventually there will be no more rounds of funding to feed the fire.

reply
Anthropic has raised $64B in total since they were founded.

Even if you say we are going to measure profit in the very special hacker news way of looking at money taken in from customer revenue against money invested and we say they can't do things like counting building data centers or buying GPUs as capital expenses and instead have to count them against profit then in 2 years time they will have made more money than they have taken in investment.

That is extraordinary.

reply
Costs can always be optimized, revenue is much harder to optimize.
reply
It is easy to get 30B when you resell something you buy for 50B
reply
The proverbial "50B" is investment in next year's model. The current model cost under "30B", and therefore "is profitable". It is a bet on scaling, yes, but that's been common throughout the industry (see, eg, Amazon not being profitable for many years but building infrastructure)
reply
Also see the Dario interview with Dwarkesh:

> If every year we predict exactly what the demand is going to be, we’ll be profitable every year. Because spending 50% of your compute on research, roughly, plus a gross margin that’s higher than 50% and correct demand prediction leads to profit. That’s the profitable business model that I think is kind of there, but obscured by these building ahead and prediction errors.

(a lot more at the link)

https://www.dwarkesh.com/p/dario-amodei-2?open=false#%C2%A70...

reply
Except the rumors are they subsidize even the inference, not that they have capex in training.
reply
The maths shows inference is very profitable. Look at how Google/AWS/Azure change the same rates as Anthropic does for running Claude models.
reply
You're missing the forest for the trees. Per-token pricing is irrelevant when you're just trying to get shit done. I pay 20 bucks a month for OpenAI, but I use likely $200+ a month of tokens just on the coding (and I'm just looking at the raw tokens, this is ignoring all the harnessing on their end). Even OpenAI has said that they're losing money on the 200-dollar subscriptions[1]. This is not a viable business model. Why do you think they are introducing ads this year[2]?

[1] https://fortune.com/2025/01/07/sam-altman-openai-chatgpt-pro...

[2] https://openai.com/index/testing-ads-in-chatgpt/

reply
> Go use an almost-SOTA model (a big Deepseek or Qwen model) offered by many bare-metal providers and you'll see what "true" token prices should look like.

Qwen3.5-122B-A10B is $0.26 input, $2.08 output. Where's the subsidy? It's ten times cheaper than Opus. Or did you mean that we're subsidizing their training? But then "OpenClaw token-vomit on top of Claude is fiscally untenable" makes no sense.

Yeah, I don't know where you got your costs from. Bare metal providers are significantly cheaper than Anthropic.

reply
Maybe he's comparing the renting price of a bare metal server on its own, and doesn't realise how drastically cheaper they are to batch together for an API provider.
reply
deleted
reply
No killer product? Coding assistants and LLM's in general are the single most awe-inspiring achievement of humanity in my lifetime, technological or otherwise. They've already massively improved my and others' lives and they're only going to get better. If pre and post industrial revolution used to be the major binary delineation of our history, I'm fairly confident it will soon be seen as pre and post AI instead.
reply
I know right? 8-year-old me dreamed of being able to articulate software to a computer without having to write code. It (along with the original Stable Diffusion) are Definitely one of the coolest inventions to ever come along in my lifetime
reply
Coding assistants are currently quite hard to run locally with anything like SOTA abilities. Support in the most popular local inference frameworks is still extremely half-baked (e.g. no seamless offload for larger-than-RAM models; no support for tensor-parallel inference across multiple GPUs, or multiple interconnected machines) and until that improves reliably it's hard to propose spending money on uber-expensive hardware one might be unable to use effectively.
reply
This is an argument against the grandparent's points (1) and (2), not their point (3).
reply
It's one clear argument for the (so get to work!) part.
reply
Computers get better and cheaper. That’s not a forever problem.
reply
Source?

GPU and RAM prices have definitely not made consumer PC's cheaper than they were before bitcoin blew up or before AI blew up.

Maybe you could make an argument that they are more cost efficient for the price point... But that's not the same as cheaper when every application or program is poorly optimized. For example why would a browser take up more than a GB or two of RAM?

And I'd postulate that R&D to develop localized AI is another example, the big players seem hellbent that there needs to be a most and it's data centers... The absolute opposite of optimization

reply
Moore's Law.

We've had RAM shocks before. We nerds can't control the Wall Street or Virginians who like to break the world every so often for the lulz. However, a wobble on the curve doesn't change the curve's destination.

reply
You have to look a bit more long term? 256Mb of what today is slow af RAM used to be pretty pricey. Price will pullback.
reply
No killer products... just robots that can do vulnerability analysis at the level of a decent security engineer and write code without tiring.
reply
I've also been using the LLM in Posthog and it has been impressive. I need to check if I can also plug a MCP/Skill to my actual claude code so that I can cross reference the data from my other data source (stripe, local database, access logs etc.) for in depth analysis
reply
This might be up your alley - have Posthog and a ton of other SaaS tools connected so you can run analysis across quant/qualitative data sources: https://dialog.tools
reply
> Coding assistants and LLM's in general are the single most awe-inspiring achievement of humanity in my lifetime

Landing a man on the moon is way more impressive. Finding several vaccines for a once in a century pandemic within a year of its outbreak is and achievement that in its impact and importance dwarfs what the entire LLM industry put together has achieved. The near-complete eradication of polio, once again, way more important and impactful.

reply
Those are all good things, but with the current AI boom we've invented something with the potential to invent those kinds of things on its own, if not now then in the near future. It's far more important and impactful to invent a digital mind that can invent an arbitrary number of vaccines than to just invent one vaccine, no matter how hard it was to invent the vaccine by hand.
reply
yeah, painting yourself into a corner at 10x speed is hardly the most awe-inspiring achievement of humanity.
reply
> no moat

I'd like to think the superior product wins. But Windows still thrives despite widespread Linux availability. I think sometimes we can underestimate the resilience of the tech oligopolies, particularly when they're VC-funded.

reply
VC can spend all the money in the world and it won't matter if the cost of switching providers is effectively zero.

If I want to switch from Windows to Linux, I have to reconsider a whole variety of applications, learn a different UX, migrate data, all sorts of annoyances.

When I switch between Codex and Claude Code, there is literally no difference in how I interact with them. They and a number of other competitors are drop in replacements for each other.

reply
>I'd like to think the superior product wins. But Windows still thrives despite widespread Linux availability.

That's because by most metrics Linux is inferior is Windows.

reply
I don't see how its possible to think this. AI coding assistants are some of the most useful technologies ever created, and model quality is by far the most important thing, so I doesn't make sense why local inference would be the path forward unless something fundamentally changes about hardware.
reply
The hardware will change. We know that.
reply
What benefit is there to dropping $50k on GPUs to run this personally besides being a cool enthusiast project?
reply
Intel has just released a high VRAM card which allows you to have 128GB of VRAM for $4k. The prices are dropping rapidly. The local models aren't adapted to work on this setup yet, so performance is disappointing. But highly capable local models are becoming increasingly realistic. https://www.youtube.com/watch?v=RcIWhm16ouQ
reply
That's 4 32GB GPUs with 600GB/s bw each. This model is not running on that scale GPUs. I think something like 96GB RTX PRO 6000 Blackwells would be the minimum to run a model of this size with performance in the range of subscription models.
reply
> I think something like 96GB RTX PRO 6000 Blackwells would be the minimum to run a model of this size with performance in the range of subscription models.

GLM 5.1 has 754B parameters tho. And you still need RAM for context too. You'll want much more than 96GB ram.

reply
Why would anyone need more than 640Kb of memory?
reply
Exactly the point though. In the 640KB days there was no subscription to ever increasing compute resources as an alternative.
reply
Well, there kinda was - most computing then was done on mainframes. Personal / Micro computers were seen as a hobby or toy that didn't need any "serious" amounts of memory. And then they ate the world and mainframes became sidelined into a specific niche only used by large institutions because legacy.

I can totally see the same happening here; on-device LLMs are a toy, and then they eat the world and everyone has their own personal LLM running on their own device and the cloud LLMs are a niche used by large institutions.

reply
The difference is computers post text terminal are latency and throughput dependent to the user. LLMs are not particularly.
reply
Sorry, I don't understand that comment. Can you clarify, please?
reply
My point is LLMs aren't more usable if the hardware is in your room versus a few states away. Personal computers still to this day aren't great when the hardware is fully remote.
reply
Agreed. But you couldn't do much on a PC when they launched, at least compared to a mainframe. The hardware was slow, the memory was limited, there was no networking at all, etc. If you wanted to do any actual serious computing, you couldn't do that on a PC. And yet they ate the world.

I can easily see the advantage, even now, of running the LLM locally. As others have said in this topic. I think it'll happen.

edit: thanks for clarifying :)

reply
Is it so hard to project out a couple product cycles? Computers get better. We’ve gone from $50k workstation to commodity hardware before several times
reply
Subscription services get all the same benefits from computer hardware getting better. But actually due to scale, batching, resource utilization, they'll always be able to take more advantage of that.
reply
It will run exactly the same tomorrow, and the next day, and the day after that, and 10 years from now. It will be just as smart as the day you downloaded the weights. It won't stop working, exhaust your token quota, or get any worse.

That's a valuable guarantee. So valuable, in fact, that you won't get it from Anthropic, OpenAI, or Google at any price.

reply
That's why we all still use our e machines its never obsolete PCs. Works just the same it did 20 years ago, though probably not because I've never heard of hardware that's guaranteed not to fail.
reply
Agree directionally but you don't need $50k. $5k is plenty, $2-3k arguably the sweet spot.
reply
as a local LLM novice, do you have any recommended reading to bootstrap me on selecting hardware? It has been quite confusing bring a latecomer to this game. Googling yields me a lot of outdated info.
reply
First answer: If you haven't, give it a shot on whatever you already have. MoE models like Qwen3 and GPT-OSS are good on low-end hardware. My RTX 4060 can run qwen3:30b at a comfortable reading pace even though 2/3 of it spills over into system RAM. Even on an 8-year-old tiny PC with 32gb it's still usable.

Second answer: ask an AI, but prices have risen dramatically since their training cutoff, so be sure to get them to check current prices.

Third answer: I'm not an expert by a long shot, but I like building my own PCs. If I were to upgrade, I would buy one of these:

Framework desktop with 128gb for $3k or mainboard-only for $2700 (could just swap it into my gaming PC.) Or any other Strix Halo (ryzen AI 385 and above) mini PC with 64/96/128gb; more is better of course. Most integrated GPUs are constrained by memory bandwidth. Strix Halo has a wider memory bus and so it's a good way to get lots of high-bandwidth shared system/video RAM for relatively cheap. 380=40%; 385=80%; 395=100% GPU power.

I was also considering doing a much hackier build with 2x Tesla P100s (16gb HBM2 each for about $90 each) in a precision 5820 (cheap with lots of space and power for GPUs.) Total about $500 for 32gb HBM2+32gb system RAM but it's all 10-year-old used parts, need to DIY fan setup for the GPUs, and software support is very spotty. Definitely a tinker project; here there be dragons.

reply
Agree on the framework, last week you could get a strix halo for $2700 shipped now it's over $3500, find a deal on a NVME and the framework with the noctua is probably going to be the quietest, some of them are pretty loud and hot.

I run qwen 122b with Claude code and nanoclaw, it's pretty decent but this stuff is nowhere prime time ready, but super fun to tinker with. I have to keep updating drivers and see speed increases and stability being worked on. I can even run much larger models with llama.cpp (--fit on) like qwen 397b and I suppose any larger model like GLM, it's slow but smart.

reply
The 4-bit quants are 350GB, what hardware are you talking about?
reply
qwen3:0.6b is 523mb, what model are you talking about? You seem to have a specific one in mind but the parent comment doesn't mention any.

For a hobby/enthusiast product, and even for some useful local tasks, MoE models run fine on gaming PCs or even older midrange PCs. For dedicated AI hardware I was thinking of Strix Halo - with 128gb is currently $2-3k. None of this will replace a Claude subscription.

reply
> qwen3:0.6b is 523mb, what model are you talking about?

1) What are you going to use that for? 0.6 model gives you what you could get from Siri when it first launched at most unless you do some tunning.

2) Pretty clear that they are talking about GLM-5.1 4-bit quant.

reply
Google doesn't release Gemma 4 if Gemini is similiar good.

We probably talk abuot a year of progress diffeerence.

Its also still quite expensive for an avg person to consume any of it. Either due to hardware invest, energy cost or API cost.

Also professionally I don't think anyone will really spend a little bit less money of having the 3th quality model running if they can run the best model.

I'm happy that we reach levels were this becomes an alternative if you value open and control though.

reply
deleted
reply
(1) is absolutely not true if you actually use these models on a regular basis and include Google in here too. The difference in reliability beyond basic tasks is night and day. Their reward function is just so much better, and there are many nuanced reasons for this.

(2) is probably true but with caveats. Top-tier models will never run on desktop machines, but companies should (and do) host their own models. The future is open-weight though, that much is for sure.

(3) This is so ignorant that others have already responded to it. Look outside of your own bubble, please.

reply
> Top-tier models will never run on desktop machines

Sorry, but you don't know that

reply
I mean it's not hard to understand that if good model can run on consumer hardware, even better models can run in data centers
reply
If we get to the point where a local model can reliably do the coding for a good majority of cases, then the economic landscape changes significantly. And we are not that far from having big open weight models that can do that, which is a first step
reply
Larger, yes, absolutely. Better? Right now it seems that bigger is better, but if we are thinking about long term future, it's not obvious that there isn't a point of diminishing returns with regards to size. I can also imagine a breakthrough, where models become much smaller, with the same or better capabilities as the current, very large ones.
reply
You are always going to get the same scaling laws in model size regardless of what else you do, so the same degree of improvement seen now relative to the smaller models will be achievable in the future. Yes, small models may be on par with previous generation large models, but the same is true for processors and you don't see supercomputers going away. It's the same principle.
reply
I was trying to use Claude.ai today to learn how to do hexagonal geometry.

Every time I asked a question it generated an interactive geometry graph on the fly in Javascript. Sometimes it spent minutes compiling and testing code on the server so it could make sure it was correct. I was really impressed.

Anyway I couldn't really learn anything since when the code didn't work I wasn't sure if I had ported it wrong or the AI did it wrong, so I ended up learning how to calculate SDF and pixel to hex grid from tutorials I found on google instead.

reply
This is also my exact experience
reply
Posted this after mythos came out? The hutzpah
reply
No moat: yes. Cooked: no. It's a race. Why assume they're going to lose? It relies on (2) which is only true if AI usefulness plateaus at some level of compute. That's a huge claim to be making at this stage. (3) AI has lots of killer products already. The big one is filling in moats. Unrealized potential though for sure.
reply
>(1) OpenAI & Anthropic are absolutely cooked; it's obvious they have no moat

I think big corporations will continue to use them no matter how cheap and good other models are. There's a saying: nobody was fired for buying IBM.

reply
How good would open source models be if they couldn't distill higher quality private models?
reply
(3) is simply a lie spread by engineers who have no other context. I manage some real estate (mid-term rentals) and everyone I know has switched over to AI robo-handlers to do the contact at this point. It's almost a passive investment at this point. Some can even handle interfacing with contractors and service requests for you. Revolutionized the field in my opinion.
reply
The model is the killer product
reply