There is minimal downside to switching to open models

upvote

There is minimal downside to switching to open models

(www.marble.onl)

357 points

by amarble21 hours ago |

upvote

by spiralcoaster1 hours ago|

[-]

This can't be for real.

The title asserts there is minimal downside to switching to open models, but the article provides zero evidence that this is true, and the author hasn't even attempted it yet. The end of the article states "I’m hoping it’s going to be minimal".

I wonder if I can get a post to the front page with the title: "There are no real barriers to humans colonizing Mars next month". And at the end, "I'm hoping there are no real challenges."

reply

upvote

by TulliusCicero1 hours ago|

[-]

Yeah, I was thinking the exact same thing:

> There remains a clear penalty for being an open LLM user. Every leaderboard consistently gets topped by proprietary models served over API. Today on June 21, 2026, Claude and GPT are at the top of the Artificial Analysis intelligence leaderboard. That’s from the performance side. The compatibility side is worse too. Claude code just works, and more generally, the big two provide nice APIs that make them easy to use, and, even if it’s a low bar, are “trustworthy” in the sense that we’ve largely all agreed we don’t mind sending them our LLM queries and trust them to handle them appropriately.

> Open models are served via various means, some by the companies that released them and some by third parties like OpenRouter. Unfortunately, both of these routes are dodgier in terms of privacy and data sharing, and I would not feel the same comfort sending API calls containing client or confidential data to them3.

> The other option or course is to run them yourself. This solves the privacy issue but is at least two of expensive, complicated, and comparatively slow.

So...there's actually quite a bit of downside, then? Why the misleading title?

reply

upvote

by fracus9 minutes ago|

[-]

I felt your exact experience. Felt like the title and article simply served to complain about Claude's impending ID rules.

reply

upvote

by coffinbirth13 hours ago|

[-]

> Open models are served via various means, some by the companies that released them and some by third parties like OpenRouter. Unfortunately, both of these routes are dodgier in terms of privacy and data sharing, and I would not feel the same comfort sending API calls containing client or confidential data to them.

That's why I'm using eurouter.ai with the following routing rule for all my requests:

  {
    "model": "glm-5.2",
    "models": [
      "deepseek-v4-pro",
      "deepseek-v4-flash"
    ],
    "provider": {
      "allow_fallbacks": true,
      "data_collection": "deny",
      "data_residency": "EU",
      "max_retention_days": 0,
      "eu_owned": true
    }
  }

Sure, it's quite expensive, but at least on a legal side data privacy is ensured. I trust them more than e.g. Anthropic, OpenAI or OpenRouter.

Personally, I find it morally unacceptable to use U.S. AI tools, because I do not want to support them financially and thus support the crimes they are involved in[1].

[1]: https://news.ycombinator.com/item?id=48512339

reply

upvote

by himata411311 hours ago|

[-]

The part that gets me about anthropic red lines is "of Americans", okay so the rest of the civilized world is up for grabs then? It's okay to destabalize allies with sabotaged tests (in machine learning) and data exfiltration outside America?

What gets me the most is that they claim that the model should follow the https://www.anthropic.com/constitution and they claim that it's embedded into the model. However, system prompts in claude code and cowork re-iterate all of these points and if they're embedded you shouldn't need to do that. Now, if you ask the API version of claude to be a hitler supporter with enough prompt engineering it will become one which directly contradicts what they claim to do, opus 4.7 specifically will be happy to create anti-(insert minority group) propaganda although I haven't had the same success with 4.8 thus far, but I also haven't been motivated enough to push it in that direction yet since I've been more interested in exploting the cyber capabilities of the model.

My conclusion from the very start is that Anthropic's strategy are pure optics and considering the fact that there was an outpoor of support for the company I think it has been very successful.

reply

upvote

by dminik8 hours ago|

[-]

Yeah, it was funny seeing a bunch of people going like "Anthropic is fighting for privacy" meanwhile I'm like "Uhh, what about the other 8 billion people?"

On second thought, it's not funny.

reply

upvote

by beng-nl6 hours ago|

[-]

As a thought experiment - such shocks (govt pressure to use models for bad purposes and govt excluding access to non-Americans) coming early in the ‘ai revolution’ will wake up the rest of the world sooner that they have to get their act together to stay competitive without relying on USA. Just like with nato.

reply

upvote

by throwaw1243 minutes ago|

[-]

> The part that gets me about anthropic red lines is "of Americans", okay so the rest of the civilized world is up for grabs then?

And this is coming from a CEO who constantly claims moral superiority and advances the idea that China is bad

reply

upvote

by nerdsniper11 hours ago|

[-]

> The part that gets me about anthropic red lines is "of Americans", okay so the rest of the civilized world is up for grabs then? It's okay to destabalize allies with sabotaged tests (in machine learning) and data exfiltration outside America?

Regardless of Anthropic's "moral" position (inasmuch as a corporation can even have morals) against spying on non-Americans, they would have no way to enforce that limitation against the government because non-citizens outside of the USA have no protections from the intrusions of the US government.

reply

upvote

by cbolton7 hours ago|

[-]

They can include these limitations in a contract which can be enforced like any contract.

reply

upvote

by nerdsniper4 hours ago|

[-]

FISA Section 702 (50 U.S.C. § 1881a) or CLOUD Act could be used to override any contractual terms that US government agencies may have agreed to. Those clauses would be unenforceable / unexecutable.

More generally it would be overpowered by the Sovereign Acts Doctrine.

The facts aren’t identical to the 2008 Yahoo FISCR case but that case sets the tone for how any clauses like this would just be brushed under the rug.

reply

upvote

by NewsaHackO4 hours ago|

[-]

I don't think they can, at least if they are making an argument for why the Defense Production Act should not apply to them. Their original argument is that they will not help with anything that is unconstitutional, such as the unlawful spying on American citizens, without a warrant.

reply

upvote

by 3 hours ago|

[-]

deleted

reply

upvote

by jazzyjackson4 hours ago|

[-]

I don’t think Defense Production Act lets the government takeover what you produce, just that if you sell something you have to prioritize selling to the Feds. There is also precedent that code is speech and the government cannot compel speech (this came up during the debacle where FBI wanted a backdoor to unlock iPhones and Apple said no, we’re not building that)

reply

upvote

by nerdsniper3 hours ago|

[-]

It’s slightly nuanced, but during COVID-19 the DPA was applied to GM for production of ventilator machines. GM had not yet ever made any ventilator machines, and had only just one week earlier begun laying the groundwork to partner with Ventec Life Systems to look into retooling their automotive electronics factory in Kokomo, IN to produce ventilators.

I agree that the Apple case indicates that there’s a lot of uncertainty around this type of issue, at least post 1953 when title II of the DPA expired after Youngstown Sheet & Tube Co. v. Sawyer (1952)

reply

upvote

by oefrha4 hours ago|

[-]

> anthropic red lines

Alleged red lines. Could be just talking points for garnering sympathy. Big tech aren’t exactly known for being truthful, especially big tech partnering with esteemed Palantir.

reply

upvote

by 2 hours ago|

[-]

deleted

reply

upvote

by avadodin9 hours ago|

[-]

These companies are so good at selling their product's likely incompetence as possibly intentional subversion.

reply

upvote

by johndough12 hours ago|

[-]

I had a look at eurouter.ai and it seems like an extremely bad offer.

- The prices are ridiculous (15 % markup for free account).

- They have a rate limit of 1000 requests per month, unless you pay 40€ per month for ... what exactly is their value proposition?

- They have a single provider (TensorX) for DeepSeek-V4-Pro, with a cache read cost that is over 100 times higher than DeepSeek ($0.44 vs $0.003625). Notably, I had to look at the TensorX website for that information, since I could not find any information about cached token cost on eurorouter.ai.

reply

upvote

by qznc10 hours ago|

[-]

I guess the prices are for "EU owned" instead of "EU hosted". The data centers in the EU where you can rent GPUs is mostly US companies.

reply

upvote

by trollbridge7 hours ago|

[-]

It looks like a business opportunity, then, to provide inference that is EU-local and/or EU-owned.

If there aren't enough businesses who want to do this, the EU should figure out how it can properly incentivise that to change.

reply

upvote

by imhoguy7 hours ago|

[-]

Hosting anything in EU must cover redtape and carbon taxes in electricity bill.

reply

upvote

by jampekka7 hours ago|

[-]

The markup is not going to the providers, only the router. It seems more like eurouter found a niche it can milk for a while.

reply

upvote

by Grombobulous6 hours ago|

[-]

That seems pretty unsubstantiated. Hetzner proves that EU data center != expensive.

Low carbon does not equal expensive, either. Solar is the cheapest power generation method. Solar plus grid scale batteries is in the same cost ballpark as natural gas.

There’s nothing about data centers that is inherently a high carbon business. It’s only a high carbon business in places like the US where political leadership purposefully fights against renewable energy projects that private businesses want to undertake on their own dime.

reply

upvote

by KronisLV7 hours ago|

[-]

Actually got curious about other alternatives to OpenRouter and looked into it a bit.

EURouter (Amsterdam): https://www.eurouter.ai/pricing

Eden AI (France): https://www.edenai.co/pricing

nexos.ai (Lithuania): https://nexos.ai/pricing/

Requesty (Germany): https://www.requesty.ai/pricing

Cortecs (Austria): https://cortecs.ai/pricing

Nordference (Estonia): https://nordference.ai/pricing

Guess those are really popping up as mushrooms, eh? Not an endorsement of any of those on my part cause I haven't personally used them, but seems like there are at least options for those who need them.

reply

upvote

by root-parent8 hours ago|

[-]

Crimes does not even starts to describe it:

"AI-assisted targeting in the Gaza Strip" - https://en.wikipedia.org/wiki/AI-assisted_targeting_in_the_G...

"Palantir allegedly enables Israel's AI targeting in Gaza, raising concerns over war crimes" - https://www.business-humanrights.org/de/neuste-meldungen/pal...

"What The Wounds Are Telling Us" - https://www.volkskrant.nl/kijkverder/v/2025/gunshot-palestin...

reply

upvote

by bandrami12 hours ago|

[-]

If data security is an actual concern I don't think there's a solution other than biting the bullet and self-hosting.

reply

upvote

by fg1377 hours ago|

[-]

If your only concern is data residency, data privacy and sharing, why not just use bedrock with the processing region locked to eu-west-2? For sure, it's not an European company serving the LLM, but it satisfies your requirements otherwise and is trusted by tons of companies worldwide.

reply

upvote

by helloplanets6 hours ago|

[-]

Anthropic already explicitly communicated that they'll store and check all the data from Bedrock or any platform, even if you've selected zero data retention, if using Mythos class models. To use these models on any platform, you'll have to accept these terms regardless of the region.

> Limited data retention and review as part of our safety work. Prompts submitted to, and outputs generated by, Mythos-class models are retained for 30 days for trust and safety purposes, on every platform where these models are offered.

> Change applies to organizations that have set up workspaces with zero data retention (ZDR) in Claude Console, use Claude Code with ZDR in Claude Enterprise, or access Claude through AWS Bedrock, Google Cloud Agent Platform, or Microsoft Foundry with ZDR.

https://support.claude.com/en/articles/15425996-data-retenti...

reply

upvote

by fg1374 hours ago|

[-]

The original comment is about GLM/deepseek models. As you already pointed out, this applies if you use those specific Claude models on any platform, so I don't know what the point is.

reply

upvote

by quikoa6 hours ago|

[-]

> it's not an European company serving the LLM

That's a pretty big downside if data privacy and sharing is one of the main concerns.

reply

upvote

by fg1374 hours ago|

[-]

I'd like to see some real reasoning here that is based on facts.

reply

upvote

by layer81 hours ago|

[-]

One of the facts is the existence of https://en.wikipedia.org/wiki/CLOUD_Act, in conjunction with https://www.law.cornell.edu/uscode/text/18/2705 and https://www.nsa.gov/Signals-Intelligence/FISA/.

reply

upvote

by trollbridge8 hours ago|

[-]

The great part about open models is that you can do this.

Do you have a sound reason to need EU data locality? You can.

Do you want the confidence (and are willing to accept the expense) of only running models on local hardware you control? You can.

Do you want the cheapest possible option - choosing a Chinese, for example, provider, or perhaps a provider offering it for free where you agree they can use your prompts? You can.

Do you need to comply with some kind of regulation like GDPR or rules for contracting with the U.S. federal government? No problem. (Although I'm still waiting for DeepSeek V4 to show up on Amazon BedRock so it can be used from GovCloud...)

Do you have moral objections and want to actually live by them? You can.

reply

upvote

by WhyNotHugo4 hours ago|

[-]

You only need to worry about GDPR and the hoster being in the EU if you're giving the model actual access to production data — which you shouldn't anyway. Use the model to write code that processes or analyses the data, so that process can easily be reproduced with deterministic results.

reply

upvote

by codedokode8 hours ago|

[-]

Not only it requires a minimum payment of 39 euro, it doesn't accept cryptocurrency althogh that can be worked around by buying a prepaid virtual card for crypto.

reply

upvote

by yogorenapan4 hours ago|

[-]

What services give you a prepaid virtual card for crypto without KYC?

reply

upvote

by ttoinou12 hours ago|

[-]

You dont care about which exact provider it is using behind the hood ?

reply

upvote

by Phlogi11 hours ago|

[-]

No, as long as they follow the requirements, especially the data privacy agreements. What would you? Price?

reply

upvote

by fredoliveira9 hours ago|

[-]

Output quality immediately comes to mind, of course.

Models are converging, but they converge in bands, and frontier is frontier. I would not like to have any workflows in any area of my business where output is generated by an assortment of models from different providers. For trivial, mundane tasks that might be fine, but it certainly doesn't apply across the board.

reply

upvote

by MarceColl6 hours ago|

[-]

Not caring about providers doesn't mean not caring about models.

reply

upvote

by ttoinou7 hours ago|

[-]

How do you know they're following requirements ? At least a quick search about the company providing the service would be useful

reply

upvote

by Phlogi5 hours ago|

[-]

Well there is an established legal system that i rely on. I don't have to check every provider, that is the value i get from this router service.

reply

upvote

by NewsaHackO4 hours ago|

[-]

The issue is that the legal system is retroactive, and information that is made public cannot be retracted. If a random company gets your sensitive data and disseminates it to the world, then sure, they will face legal repercussions, but the damage would have already been done to you.

reply

upvote

by vonneumannstan4 hours ago|

[-]

Lol what performative shite. Chinese astroturfing 101. You're either mentally ill or a shill.

reply

upvote

by simianwords12 hours ago|

[-]

GDPR compliant llm was a joke a few months back but here we are

reply

upvote

by speedgoose12 hours ago|

[-]

I work in Europe, sometimes with sensitive data, and LLMs weren’t an exception a few months ago.

Maybe it was funny to you, but designing data platforms that respect GDPR and involve LLMs is a thing.

reply

upvote

by throw123456789112 hours ago|

[-]

But is no joke anymore.

reply

upvote

by throwaway2744812 hours ago|

[-]

Why use EU specifically? I get not trusting the US, of course, but surely the EU isn't far behind in its desire to spy on its own citizens. Do you not live there?

reply

upvote

by earthnail12 hours ago|

[-]

From all the large governmental institutions, the EU is the one currently holding up traditional western values. That gives it street cred in this subject.

reply

upvote

by hdgvhicv11 hours ago|

[-]

https://www.theguardian.com/us-news/ng-interactive/2026/feb/...

The age old joke;

A Russian and an American are drinking at a bar

The Russian says "I'm impressed by american propaganda. It's so subtle but effective."

The american responds "What are you talking about, we don't do propaganda."

reply

upvote

by t-39 hours ago|

[-]

The version in my fortune file is better:

A Russian and an American get on a plane in Moscow and get to talking. The Russian says he works for the Kremlin and he's on his way to go learn American propaganda techniques.

"What American propaganda techniques?" asks the American.

"Exactly," the Russian replies.

reply

upvote

by 0xDEAFBEAD6 hours ago|

[-]

I'm of the opinion that there is considerably more wailing about US government propaganda than actual US government propaganda. People who reference supposed US government propaganda rarely provide much in the way of concrete examples. Probably because there are legal restrictions on covert propaganda in the US:

https://www.law.cornell.edu/wex/covert_propaganda

To be clear, I'm happy to grant that:

* The Pentagon won't provide jets for your war movie if your war movie portrays the US military in a bad light

* The US engages in information operations in foreign countries, e.g. discouraging people in the Philippines from getting the Chinese COVID vaccine

* Voice of America and similar US-government sponsored outlets are, in fact, sponsored by the US government

But the notion that covert, English-language US government propaganda is ubiquitous and effective seems like a half-baked, un-falsifiable conspiracy theory with little supporting evidence.

The internet is full of false or misleading claims about the US which go un-refuted. There's just way too much low-hanging fruit going un-picked here to believe that the USG is running massive English-language covert propaganda ops.

A specific example of a false anti-American claim which is extremely widespread: Many Europeans believe that the US promised to protect Ukraine in the 1994 Budapest Memorandum. This is false. We only promised to go to the UN Security Council, which we did. You can verify for yourself with a quick trip to the UN website, the memorandum is not very long: https://treaties.un.org/doc/Publication/UNTS/Volume%203007/P...

If the American government possessed the propaganda wizardry that people ascribe to it, I expect the entire internet would be well-acquainted with the actual contents of this memorandum. Instead, you have randos like me trying to fight a tsunami of misinformation (likely Ukrainian-origin) related to this memorandum, using only a shovel.

reply

upvote

by gpvos5 hours ago|

[-]

> Many Europeans believe that the US promised to protect Ukraine in the 1994 Budapest Memorandum.

European here, following the Ukraine situation closely. I absolutely never heard that one. The main issue in the 1994 Budapest Memorandum that has been mentioned in the media in recent years is that Russia would respect the independence, sovereignty, and existing borders of Ukraine, which is clearly there in article 1. Thanks for the link though, it is quite enlightening.

reply

upvote

by throwaway274484 hours ago|

[-]

Have you ever read Manufacturing Consent? A conspiracy is not necessary for wide-spread propaganda campaigns—just a confluence of incentives that act against the common interest (even in the US) but work in the interest of the ruling class.

reply

upvote

by zzzeek2 hours ago|

[-]

and then Chomsky goes on to form a deep online friendship with Jeffrey Epstein

reply

upvote

by rendx1 hours ago|

[-]

The idea of reading is that you yourself think and reason about it, not that you blindly trust anything based on the author's name, deeds, or reputation. Ad hominems are a cheap way to remain ignorant and - yet again - fall into the trap of blindly believing some other third party.

reply

upvote

by tumdum_6 hours ago|

[-]

> Many Europeans believe ...

> ... misinformation (likely Ukrainian-origin) ...

Your post is also "a half-baked, un-falsifiable conspiracy theory with little supporting evidence" ;)

reply

upvote

by 0xDEAFBEAD6 hours ago|

[-]

Can you point me to any sort of Ukrainian law which would prohibit this type of info op? See e.g. https://en.wikipedia.org/wiki/Ghost_of_Kyiv

If the US was attacked the way Ukraine was attacked, and foreign intervention was key to our survival as a nation, I expect the Pentagon would deploy foreign info ops in that situation. That doesn't seem like a heavy lift to me.

Occam's Razor: If something is a core/essential national interest, it's reasonable to expect a government to pull out all the stops. But governments are fairly ineffectual for the most part. Everyone saw how the USG mishandled e.g. COVID, mishandled the war with Iran, yet we expect the USG to be wizards at covert propaganda? It doesn't really track. I'm sure we are doing covert propaganda here and there, and we would ramp it up in an emergency.

Anyways, if you want to point to specific content you suspect as USG propaganda, be my guest. My point is, the fact that people rarely do this seems evidence against widespread USG propaganda. "They don't point it out because the propaganda is too good" has a suspicious un-falsifiable quality to it.

reply

upvote

by 65102 hours ago|

[-]

>I'm of the opinion that there is considerably more wailing about US government propaganda than actual US government propaganda.

okay....

>People who reference supposed US government propaganda rarely provide much in the way of concrete examples.

YOU'VE ALREADY SAID THAT

reply

upvote

by mopsi6 hours ago|

[-]

"UN Security Council action" is a broad term that can include deployment of international UN-led military forces, as in the Korean War: https://en.wikipedia.org/wiki/United_Nations_Command

A few years prior to the Budapest Memorandum, the UN Security Council had authorized military action to liberate Kuwait. 42 countries participated in the coalition that drove Iraqi forces out of Kuwait: https://en.wikipedia.org/wiki/Coalition_of_the_Gulf_War

The expectation at the time was clearly more than just "we'll bring it up at the UN for dicussion". The current weaseling over the exact wording looks weak and pathetic, and has a certain flavor of propaganda that tries to convince everyone of something that's not quite true. The fact remains that the US strong-armed Ukraine out of nuclear weapons, and when Ukraine was eventually invaded, tried to strong-arm Ukraine into surrender. This reflects very poorly on the US.

reply

upvote

by 0xDEAFBEAD5 hours ago|

[-]

"Russia blocks Security Council action on Ukraine"

...

"A ‘no’ vote from any one of the five permanent members of the Council stops action on any measure put before it. The body’s permanent members are: China, France, Russian Federation, the United Kingdom, and the United States."

https://news.un.org/en/story/2022/02/1112802

(emphasis mine)

This is 101-level UN stuff. If Ukrainian diplomats were unaware that Russia can veto Security Council resolutions, that means they were totally incompetent.

It's also misleading to say the US "strong-armed" Ukraine out of its nukes... it was originally Ukraine's idea to abandon nukes, and they didn't have the control codes for the nukes on their territory anyways. The US attempted influence via carrots (financial assistance), not sticks ("strong-arming").

In any case, we did far more than just bring it up at the UN for discussion. See this map from a year or two ago: https://pbs.twimg.com/media/HKNCFWPbEAA7p5g?format=jpg&name=...

Mostly, in response to US generosity, Europeans just complained that the US should give even more. Your comment illustrates this perfectly--you speak as though the US only responded via UN diplomacy, completely neglecting over one hundred billion dollars the US sent in Ukraine aid, to a country which is not even a treaty ally of ours. When Biden was president, right after he saved Ukraine's butt in the initial invasion, public opinion of the US in Europe was barely even net-positive.

The real question is why Europeans spend so much time harassing the US for Ukraine funds, and so little time harassing tight-fisted countries which are actually in Europe like Ireland, Switzerland, Austria, Spain, etc. The answer: Europe has a transatlantic philosophy that the US brings the guns and the Europeans bring the scolding. As long as Ireland/Switzerland/Austria/Spain nod along with the scolding, they are doing their part, as far as Europe is concerned.

reply

upvote

by mopsi2 hours ago|

[-]

  > This is 101-level UN stuff. If Ukrainian diplomats were unaware that Russia can veto Security Council resolutions, that means they were totally incompetent.

There are ways around it, if there's a will: https://en.wikipedia.org/wiki/United_Nations_General_Assembl...

It is safe to say that the present lack of leadership from the US was not foreseen at the time. It was unimaginable that Russia would launch a major ground war in Europe and that the American president would blame the victim of the aggression and try to coerce them into surrender while sucking up to the aggressor. This is not how things were conducted back then. It was the era of Schwarzkopfs showing strength and resolve by giving presentations on how coalition tanks had pummeled the enemy in the past few weeks, not of Sullivans showing weakness and indecisiveness by endlessly yapping about "escalation".

The core problem is that the US has spent almost a century embedding itself in all kinds of relationships (cultural, political, economic, military), but has lost the ability to carry out that central role. Biden did not save Ukraine. The limited but valuable military support fostered an unhealthy relationship that gave the US a veto over Ukraine's (and other allies') actions, but the US leaders do not have the statemanship to use that power responsibly. Biden's legacy is the shortsighted micromanagement that turned the fast and effective Ukrainian counteroffensives of 2022 into slow and costly trench warfare of 2026, all while emboldening enemies like Iran to launch assaults like October 7th.

reply

upvote

by wat100004 hours ago|

[-]

The Budapest Memorandum only requires going to the Security Council if nuclear weapons are involved. There's no required action at all for non-nuclear attacks. This isn't "weaseling over the exact wording," it's just the plain language of the memorandum.

It really amazes me how much misinformation is out there about this thing. It only has six points, each one a single paragraph long. It's very quick and easy to read, yet people apparently can't be bothered to look up the actual text of the thing they're discussing.

reply

upvote

by CamperBob24 hours ago|

[-]

You can argue all day about the letter versus the spirit of the Budapest memorandum, but good luck getting any other countries to give up their nukes in the future.

That's only one consequence of Trump's de-facto betrayal of Ukraine in support of his daddy figure in the Kremlin.

reply

upvote

by wat100003 hours ago|

[-]

I have a really hard time accepting the idea that the spirit of the memorandum was that the signatories should actively defend Ukraine against non-nuclear attack, when it would have been so easy to write that explicitly.

I completely agree about no countries giving up their nukes in the future, but that's a consequence of the weak agreement, plus other actions like knocking over Iraq and Libya but not North Korea, tearing up the JCPOA with Iran, and... well, it seems like non-proliferation is mostly lip service in general.

reply

upvote

by kortilla12 hours ago|

[-]

>traditional western values

This seems tautological because Europe is pretty weak on the values that people in the US might care about (freedom of speech, limited govt, etc).

What values specifically are you optimizing for here?

reply

upvote

by p_j_w12 hours ago|

[-]

> values that people in the US might care about (freedom of speech, limited govt, etc).

The US federal government forced Paramount to take Colbert off the air. Seems that people in the US don’t actually value these things.

> What values specifically are you optimizing for here?

Probably not being fascist.

reply

upvote

by Gigachad12 hours ago|

[-]

They currently have the military circling a pool to intimidate people trying to take photos of the botched paint job.

reply

upvote

by Applejinx4 hours ago|

[-]

In fairness, it's our Berlin Wall, and I absolutely would want a piece of the delaminating paint as a souvenir. The difference is we are so early into the collapse that the armed guards are still there. But yeah, it's definitely not just pictures, I'd want a piece of the blue stuff. It'd be a souvenir and also me taking it away from there, so win/win. Of course there's armed guards guarding the swamp now, what else would there be?

reply

upvote

by fuck_google11 hours ago|

[-]

Intimidation is the sincerest form of flattery.

reply

upvote

by throwaway2744812 hours ago|

[-]

> The US federal government forced Paramount to take Colbert off the air.

Not really; the Ellisons are quite close to Trump. Nobody was forced to do anything. Had the FCC actually revoked their license, and had Paramount actually been willing to fight, they could have sued. It's not easy to force anyone that rich to do anything; the state works on behalf of capital. It seems like europe is more aware of the meaningless bluster than the actual crimes being committed

There are much better things to point to to illustrate the deterioration of the rule of law, like blatantly illegal deportation of citizens without due process. Or raping children in concentration camps under the guise of cracking down on crime. We may never even know who was seized and what happened to them and there's little incentive for our very pro-corporate media to report on it.

But sure, paramount is the real victim here.

reply

upvote

by hvb27 hours ago|

[-]

You might want to read the 3rd paragraph of this article

https://en.wikipedia.org/wiki/Merger_of_Skydance_Media_and_P...

Read that timeline and then see if you're still convinced that they didn't at least seem to have done a thing or 2 to appease the federal government

reply

upvote

by solumunus12 hours ago|

[-]

The UK can arrest you for hate speech. You can disagree with that policy on free speech terms if you want, and that’s really a maximal free speech position. It’s a very strange position to hold if you’re claiming that the U.S. is better when it comes to free speech. The U.S. administration is engaged in active smear campaigns against anyone who speaks loudly against them, threatened to revoke licenses of media companies, they’re suing people and corporations to silence them and pressure them into conformance, they’re threatening to deport people who are simply expressing anti-Israel views, threatened to remove funding from universities, deployed the military in cities they don’t like for no other reason than intimidation of political rivals. This is just off the top of my head.

There’s just no comparison really. You must really be inhaling some nonsense X propaganda if you think government overreach is worse in Western Europe.

reply

upvote

by calgoo9 hours ago|

[-]

The UK is not the EU, the UK is US "lite", they have always been that way, thats not something new.

reply

upvote

by 7 hours ago|

[-]

deleted

reply

upvote

by tonyedgecombe9 hours ago|

[-]

> The UK can arrest you for hate speech.

https://www.facebook.com/story.php?story_fbid=13879460433775...

reply

upvote

by nxm9 hours ago|

[-]

I as a individual won’t get arrested for speaking my mind, and that’s much more important than some legal battle around corporate media.

„ deployed the military in cities they don’t like for no other reason than intimidation of political rivals” That’s one perspective on simply trying to enforce laws.

Moreover, let’s not forget about how Biden government tried to silence Rogan.

reply

upvote

by vrganj7 hours ago|

[-]

You know who else was simply enforcing laws? The Gestapo.

reply

upvote

by goodpoint10 hours ago|

[-]

> The UK can arrest you for hate speech.

And that's a good thing.

reply

upvote

by throwaway2744812 hours ago|

[-]

I'm honestly not really sure what "traditional western values" have to do with where to store data. What does that even refer to—individualism? Christianity? Representation in court by lawyers? How does this intersect with the topic at hand?

Edit: c'mon people, if you're going to use such ambiguous phrases at least have the spine to clue the reader in to what you want them to refer to in this context.

reply

upvote

by NonHyloMorph9 hours ago|

[-]

Well there have been a lot.. philosopy, polis, democracy, hemlock cup, enlightenment (note the perversion of "the dark enlightenment"), modernity, the resistance (against Nazism), psychoanalysis, postmodernism and critical studies (postmodernism in the genuine sense of the philosophies/theories that you would assign that label to and not in the misguided sense of relativism as arbitrarity; basically continental philosophy, frankfurt school (e.g. adorno horkheimer, habermas) and the french (e.g. foucault, derrida, deleuze (& guattari))

Of course there were also absolutism, colonialism, the jacobines, nazism & facism, to name just a few. Part of western values, from my perspective at least, is an implicit promise, that what happened in the 20th century with facism was the darkest hour, so to speak-> never again

reply

upvote

by cpursley12 hours ago|

[-]

With all the issues in the US and generally wrong direction, I can’t remember them ever arresting people for mean tweets in the way that Germany and the UK have. They all seem to be running full speed towards a surveillance state.

reply

upvote

by lmm12 hours ago|

[-]

> With all the issues in the US and generally wrong direction, I can’t remember them ever arresting people for mean tweets in the way that Germany and the UK have.

Then you haven't been paying attention. The constitution prevents citizens from being convicted, but that doesn't stop arrests or being turned away at the border (even for permanent residents who've lived in the US for decades), and US citizens don't seem to care, so it's cold comfort for many of us.

reply

upvote

by leptons11 hours ago|

[-]

>and US citizens don't seem to care

I think maybe you haven't been paying attention.

Most of us do care. Trump's approval rating is pretty low at 36%, and his disapproval rating is high. Just because he's still causing chaos doesn't mean the majority of us don't care about it. There's just no legal way to remove him, and his cronies simply won't do it - there's not enough votes in congress or he would have been gone after his first or second impeachment.

https://www.npr.org/2026/06/20/nx-s1-5861764/trumps-job-appr...

reply

upvote

by lysium11 hours ago|

[-]

I understand your point, however I don’t buy „there's just no legal way to remove him“. With so low ratings where are the daily protests against such type of government? Surely, nationwide daily protests would make elected officials reconsider their positions, given an upcoming midterm election, while there still is one.

Don’t get me wrong, I know the thousands reasons why you won’t join a protest, I’m „guilty“ myself. I just want to argue against your argument that I quoted because this puts all of us in an unhelpful victim mentality.

reply

upvote

by throwaway2744810 hours ago|

[-]

> Surely, nationwide daily protests would make elected officials reconsider their positions, given an upcoming midterm election, while there still is one.

Hah. When was the last time a non-violent protest yielded some kind of result by itself? Certainly never in american history.

Anyway, there are daily protests. They just aren't covered by the media. Hell, the protests for palestine never stopped... the media just never wanted to cover them.

reply

upvote

by t-38 hours ago|

[-]

Women's suffrage was relatively nonviolent.

reply

upvote

by trollbridge7 hours ago|

[-]

As was the civil rights era. The people who chose to be violent made it take longer, actually.

reply

upvote

by t-37 hours ago|

[-]

I don't think that's really accurate. Without the perceived threat from the militant factions, the peaceful protesters wouldn't necessarily have had political backing and support. Peaceful movements had been suppressed until that point, after all.

reply

upvote

by trollbridge3 hours ago|

[-]

I think it is. Change happened because a majority of reasonable people wanted it to change.

Terrorist attacks, kidnappings, etc made that change take longer. What made MLK Jr so unique was that he carried a message of peace, not a message of war.

The militant factions never had any real power and would have never been close to powerful enough to overthrow the government, and if they’d been more successful, would have swayed the masses’ opinions in the wrong direction.

reply

upvote

by 0xDEAFBEAD7 hours ago|

[-]

Largest protests in US history in the past year:

https://en.wikipedia.org/wiki/List_of_protests_and_demonstra...

reply

upvote

by t-38 hours ago|

[-]

Although 36% is low, it's not that low. The US's two-party system means that ~40-50% are aligned with either party and approve/disapprove on GP. The real number to pay attention to is the change over time in the approval rating: https://www.cnn.com/polling/approval/trump-cnn-poll-of-polls

Trump's highest rating was ~47% when he came into office, but he was pretty stably in the low 40s until the new war. The actual drop is somewhere from ~40-42 to ~36-38 - about 10% of his base. Significant, but probably not enough to actually matter unless it drops further.

reply

upvote

by rob7411 hours ago|

[-]

Then again, nationwide daily protests would give the Trump administration an excuse to send ICE / the army / whoever else they can send to the cities where the protests take place (I guess they would be mostly blue-leaning ones) to "restore order", and at the same time lay the groundwork for influencing the November elections.

But the turnout at the periodic nationwide "No Kings" protests has been very good, and they have fortunately stayed peaceful.

reply

upvote

by throwaway2744810 hours ago|

[-]

You'd think a "no oligarchs" protest would be a little more useful given that we aren't likely to revert to a monarchy any time soon.

checks notes what's this? The protests were organized by oligarchic lackeys? Hmm

reply

upvote

by lmm11 hours ago|

[-]

This isn't about Trump. No 4th amendment rights at the border has been an issue for at least 20 years, but US citizens don't care because it doesn't affect citizens.

reply

upvote

by zingar11 hours ago|

[-]

Yep, this was an issue long before Trump. They’ve just amped up the scale and stopped bothering with the deceit that they know doesn’t bother Americans.

reply

upvote

by PoignardAzur10 hours ago|

[-]

A 36% approval rating is sky-high for a president that started a pointless immensely costly war after getting elected on a platform of "no more costly wars" and is in the process of negotiating an immensely unfavorable deal with Iran after getting elected on a platform of "Obama's deal with Iran was terrible, I could do much better".

By contrast, Biden at the same point in his term was hovering around 39%, for the heinous crime of... rebuilding the US economy? Including some woke riders in his infrastructure bill?

At this point, a fair assessment of US citizens is that on average, they seem to consider that being a right-wing autocrat wannabe, threatening to invade allied countries "as a negotiating tactic", being a climate change denier, starting a humiliating failed war, trying to blackmail the press into compliance, etc, are about 3% worse than being a cringe center-left bureaucrat.

"US citizens don't seem to care" is an apt hyperbole.

reply

upvote

by throwaway2744810 hours ago|

[-]

It doesn't help that many of those "center-left" democrats (whatever that refers to) seem to be criticizing trump for letting iran off too easy, not you know starting a stupid war nobody in their right might would want, bombing a school, wasting american lives, driving up prices, risking the global economy, throwing lebanon under the bus... nope, he let iran off too easy. Cf cory booker

When the parties are both fucking stupid when it comes to issues that matter, the entire right/left spectrum goes out the window.

reply

upvote

by NamlchakKhandro7 hours ago|

[-]

NPR as your source isn't exactly a compelling argument tbh

reply

upvote

by pell7 hours ago|

[-]

Whenever I ask people to explain their issues with NPR it’s some cherry-picked news articles here and there that were somewhat biased. In my experience NPR often tries to be incredibly neutral, almost comically so, when criticizing any administration.

reply

upvote

by cpursley12 hours ago|

[-]

You are talking about something different (in bad faith). Please share a single instance of a US citizen being arrested for an offensive social media post.

reply

upvote

by lmm11 hours ago|

[-]

A 30 second search found me https://www.fire.org/news/he-spent-37-days-jail-facebook-pos... . You can beat the rap but you can't beat the ride. (And you're pretty thoroughly proving my point about US citizens not caring about anyone else)

reply

upvote

by nxm9 hours ago|

[-]

Yay you found a single instance, and more over there are legal means of recourse, unlike the the UK when you’re jailed or fined and that’s it

reply

upvote

by wat100004 hours ago|

[-]

Don't give a sarcastic reply about finding a single instance, when the request was literally "Please share a single instance." It's just silly.

reply

upvote

by t-39 hours ago|

[-]

The US has arrested many people for speech, and even made charges stick many time. A famous historical example is charging Eugene Debs (and many others) with sedition for opposing WWI and the draft. There was at least one case of being arrested for political social media posts, already linked in adjacent threads. Threats of violence or even sufficiently harsh language to cause fear-for-life can be a crime. "Revenge porn" and deepfakes have had laws passed curtailing them and prosecutions made. The US is certainly less restrictive of speech than other countries, but you're nowhere near entirely free to say or post anything you want.

reply

upvote

by 8 hours ago|

[-]

deleted

reply

upvote

by dminik8 hours ago|

[-]

The posts don't even need to be offensive, just uncomfortable:

https://www.fox4news.com/news/woman-arrested-facebook-post-c...

reply

upvote

by hdgvhicv11 hours ago|

[-]

https://youtu.be/tB3WVygAM8I

reply

upvote

by vrganj7 hours ago|

[-]

Why does it need to be a US citizen? Is mistreatment okay if they're not a citizen? That clause reads extremely chauvinist to me.

reply

upvote

by nxm9 hours ago|

[-]

A few people being stopped to check if their residency is valid is more than fine considering the last admin flooded the country with 20mil migrants with its open border policy

reply

upvote

by delis-thumbs-7e11 hours ago|

[-]

[x] Doesn’t know UK not in EU [x] Thinks people inciting violence online a free speech -issue [x] Calls Germany a surveillance state when US uses Palantir - a US company - to openly spy on its citizens

X seems to work great. Inciting men in with gambling, porn, crypto, ai and other broistan staples, then feeding them far-right nonsense info points.

reply

upvote

by Accacin6 hours ago|

[-]

I can immediately tell you're not arguing in "good faith" when you resort to "mean tweets".

The numbers commonly being reporting include stalkers, criminals, etc.

You don't get arrested for being politically incorrect in the UK. You get arrested for posting something threatening, harassing, inciteful, or grossly indecent. Also, being arrested and being charged are two completely separate things.

reply

upvote

by rob7411 hours ago|

[-]

By "mean tweets" I assume you mean death threats? How about not threatening to kill someone on social media, is that so hard to do?

reply

upvote

by Latty12 hours ago|

[-]

They literally arrested people for quoting Charlie Kirk in tweets after his death.

reply

upvote

by cpursley12 hours ago|

[-]

Source? Or are you another "trust me bro" Redditor.

reply

upvote

by leptons11 hours ago|

[-]

One guy spent 37 days in jail for re-posting a thing trump said ("We have to get over it" in reference to a school shooting), after Charlie Kirk was killed.

https://www.bbc.com/news/articles/cg7pyjxjxrvo

reply

upvote

by nxm9 hours ago|

[-]

So one guy, and now he’s suing which is a form of justice. No such path in the UK…. You’re fined and/or arrested and that’s all

reply

upvote

by vrganj7 hours ago|

[-]

Are you trying to say one can't sue in the UK? I'm confused by why you would think such a thing.

reply

upvote

by seba_dos18 hours ago|

[-]

Yes, the US doesn't arrest people for death threats on Twitter, it's too busy actually killing those that oppose ICE.

reply

upvote

by gpvos4 hours ago|

[-]

I know of many examples from the US and none from Germany and the UK. If they're truly so plentiful, please enlighten me with a link or two of the latter.

reply

upvote

by solumunus12 hours ago|

[-]

That’s not the EU.

reply

upvote

by 0xDEAFBEAD7 hours ago|

[-]

"The situation for free speech in Europe is even worse than I thought"

https://eternallyradicalidea.com/p/the-situation-for-free-sp...

reply

upvote

by NicuCalcea6 hours ago|

[-]

Same old tired arguments from Americans about how if you don't let fascists have free speech, it's not really free speech.

reply

upvote

by 0xDEAFBEAD6 hours ago|

[-]

Who controls the definition of "fascist"?

In any case, practically speaking, censorship helped the rise of the Nazis: https://www.fire.org/news/blogs/eternally-radical-idea/would...

You can see far-right parties surging across Europe. Speech restriction isn't just authoritarian, it's also counterproductive.

As an American I am actually quite worried about Europe's far right. Those guys are very scary, and it's creepy the way they have been able to influence the right here in the US. The MAGA movement was far more multicultural back in the 2010s, before Europe's far right was able to influence it with their ethnic cleansing and pogrom fantasies.

If you're in a hole, maybe stop digging?

reply

upvote

by Phlogi11 hours ago|

[-]

US Data Privacy is not sufficient.

reply

upvote

by throwaway2744811 hours ago|

[-]

For what? Does the EU not want to spy on its citizens? That strikes me as... unlikely.

Why not host in east asia? Or southeast asia? Or south america? Or africa? Then you avoid both the government with incentive to spy on you (assuming you live in the EU) and american companies.

reply

upvote

by nickserv9 hours ago|

[-]

EU member *countries" certainly do, but that's true of all countries that have the ability.

If anything the EU puts limits on what EU member countries and companies can do. By hosting in one of the EU countries you have stronger legal guarantees on data privacy than in any other area. A possible exception is Switzerland (not a EU member), which historically has had even stronger privacy laws, though these have been weakened recently IIRC.

reply

upvote

by kergonath9 hours ago|

[-]

> Does the EU not want to spy on its citizens?

You do not seem to understand what the EU is. It is not a country, it does not have a police or anything like the NSA.

reply

upvote

by Phlogi5 hours ago|

[-]

No they don't want to spy. They want to protect their citizens data, that is why we have GDPR. The other areas you mention do not provide this legal certainty and have different approaches to data privacy.

reply

upvote

by julianlam17 hours ago|

[-]

I think it's interesting that people write off open weight models because they're "a few months behind" proprietary models.

I know LLMs move at the speed of light (especially these past few quarters), but if Opus and GPT "a few months ago" were really like open weight models, then there's really no reason to not switch, especially for those who were using these models a few months ago.

Your codebase didn't change, so use the open weight model. Don't move the goalposts.

reply

upvote

by kgeist17 hours ago|

[-]

Every new proprietary model is "groundbreaking" and "look, it just solved task X that no other model could solve," only to be referred to as "that crappy previous-generation model" a month later.

So yeah, I'm totally fine using Kimi-2.7, GLM-5.2 or Deepseek-v4. I think we've already hit the ceiling and most improvements now seem to be from harness improvements and slightly better RL to improve reasoning/tool calling.

reply

upvote

by jbverschoor15 hours ago|

[-]

Not only that, but to me it seems that after a week the intelligence is being downscaled or routed. Maybe because of lack of capacity

reply

upvote

by conception5 hours ago|

[-]

You can check https://marginlab.ai/trackers/codex/

It’s pretty good at catching when performance is degraded. It was for a week or so before Fable launched for instance, probably due to a/b testing or capacity as you noted.

reply

upvote

by matheusmoreira15 hours ago|

[-]

There's at least the possibility that they intentionally degrade the models as time passes. We can't really verify that we're getting what we're paying for all of the time. All the more reason to invest in local inference.

reply

upvote

by inigyou14 hours ago|

[-]

What if the new model is exactly as good as the last model on launch day but better than the last model was on the new model's launch day because it was degraded? Every single time?

reply

upvote

by foo4211 hours ago|

[-]

Makes me think of [shepherd tones](Shepard tone - Wikipedia https://share.google/xooRbF7wIIhcsTt2J) which sounds like they're rising in pitch indefinitely

reply

upvote

by inigyou39 minutes ago|

[-]

why are you linking to Wikipedia in invalid markdown format, which wouldn't work on HN even if it was valid, to a site called share dot google?

reply

upvote

by no-name-here10 hours ago|

[-]

There are lots of benchmarks to compare the absolute values of different models on the same scale (as opposed to vibes (my apologies for the shorthand), etc.).

reply

upvote

by matheusmoreira12 hours ago|

[-]

The thought has definitely crossed my mind. I don't think it's true because there's definitely an improvement when new models are released.

Maybe the truth is the newest models aren't actually as impressive as we thought. Maybe our perception of progress is being manipulated via months of gradual, silent and unverifiable degradation.

reply

upvote

by LPisGood13 hours ago|

[-]

People talk about this a lot. What I have never seen is a discussion of methods they might employ to degrade the models.

Let’s say I’m a bad faith LLM operator, and I want to degrade my model so the next release looks better and people want to switch to the more expensive one. How would I do that?

reply

upvote

by nessex12 hours ago|

[-]

They would quantize the model. That'd make it cheaper to run, and have slightly worse output but it would still generate outputs with a similar feel, derived from a compressed version of the same knowledge base etc.

They wouldn't even need to do this uniformly, quantized versions of the model could be routed only a subset of the requests. They could do this to nerf the old model, or more likely just to give themselves more hardware to run the new one on by handling more requests on less hardware. Or to handle increased request volume as traffic ramps up faster than hardware can be provisioned.

Playing with local models at various quants, the degradation can be hard to spot. Sometimes it's only noticeable in aggregate. And even then, you never really know if you just got unlucky with a bad response due to RNG.

I've had Opus 4.6 fall into some weirdly incoherent loops that I rarely see from even Sonnet, that felt like the kind of thing I got frequently with Qwen3.5 9B on local. And the above applies... Was that just bad RNG? Or was my request to Opus routed to some lower quality variant? There's no great way for me to tell for any given request, nor any way to guarantee Anthropic _didn't_ do that.

reply

upvote

by OccamsMirror11 hours ago|

[-]

I have had the same experiences you've had with 4.6 and it was ever since they brought out 4.7. It's fairly obvious they're doing something like you've said here.

reply

upvote

by nessex10 hours ago|

[-]

Forgot to mention, but it was after the 4.7 release when I was still using 4.6 that I saw those loops too... Before that, 4.6 had been a pretty seamless experience.

reply

upvote

by tsss6 hours ago|

[-]

And guess what all the providers of open models do: They quantize, badly.

reply

upvote

by csunbird6 hours ago|

[-]

This is why you pay premium for trusted providers, who are verified to not quantize

reply

upvote

by maybe_pablo12 hours ago|

[-]

Weight quantization, n-expert capping, routing to smaller model, context window truncation, aggressive sampling constraints, lossy speculative decoding and probably more.

reply

upvote

by trollbridge7 hours ago|

[-]

I can't prove any of it, but it sure feels like that happens sometimes on Anthropic's platform.

I don't seem to get any of this with GPT-5.5 or GPT-5.5-Pro (not that I use 5.5-Pro enough to know for sure, but when I do use it, it never seems nerfed).

reply

upvote

by alfiedotwtf10 hours ago|

[-]

I'm pretty sure you could do n-expert capping on any MoE model with only a handful lines of changes to ik_llama.cpp, but yeah... my bet is the have various quantisations and run the lower ones at peak (along with different system prompts i.e we're GPU-bound right now. Get to the point with less chatter)

reply

upvote

by Tepix12 hours ago|

[-]

Use quantisation.

reply

upvote

by manyatoms14 hours ago|

[-]

Unless what you're getting is really explicitly spelled out in a contract, you should flatly assume that they're doing whatever they like whenever they like.

reply

upvote

by OtomotO13 hours ago|

[-]

Even if it's in the contract, but can't be verified.

reply

upvote

by taytus15 hours ago|

[-]

At current prices, and considering these OS Models' performance, investing in local inference sounds like a bad idea.

reply

upvote

by matheusmoreira15 hours ago|

[-]

Current prices are insane but at this point I'm starting to feel like it's an existential issue. I'm not a US citizen. At any point the USA could come up with some arbitrary export controls. Not having a computer capable of running at least Qwen is starting to actually seem risky to me.

At least it's going to be usable as a very high end gaming PC.

reply

upvote

by awakeasleep14 hours ago|

[-]

Why would you buy and build everything before the low probability catastrophe strikes, though? You don’t get any benefit from switching early and you pay a big opportunity cost.

reply

upvote

by Lapel274213 hours ago|

[-]

> low probability catastrophe

There is also a low probability that someone enters peace negotiations solely to threaten the negotiators with death, yet here we are. With these guys it is: Better safe than sorry.

reply

upvote

by inigyou14 hours ago|

[-]

because as soon as it strikes computer hardware will be completely unavailable to buy?

reply

upvote

by CamperBob213 hours ago|

[-]

Also, there's a nontrivial learning curve involved in running your own inference server, once you move past the casual-goofing-around-with-llama-server stage. If you care about not being a sharecropper on Sam's or Dario's plantation, you should consider learning the ropes. Even if you don't put these skills to immediate use in your day job.

I didn't appreciate this until I started down that road myself.

reply

upvote

by matheusmoreira12 hours ago|

[-]

> If you care about not being a sharecropper on Sam's or Dario's plantation

Couldn't have put it better myself. That's what all this comes down to. Owning the hardware, owning the inference. Not perpetually renting them out on a meter like in the dystopian future they're envisioning.

reply

upvote

by inigyou7 hours ago|

[-]

You also have the option to not use AI

reply

upvote

by matheusmoreira2 hours ago|

[-]

Yeah but the truth is I don't want to go back to the pre-LLM world. I've been programming alone for over ten years. Having a coding buddy to talk to, collaborate with or just bounce ideas off of quite literally changed my life. I don't want to go back to solo programming, and my projects aren't exactly swimming in a sea of active contributors.

reply

upvote

by CamperBob24 hours ago|

[-]

Not in the future, not if you want to get paid.

reply

upvote

by OtomotO13 hours ago|

[-]

Because you will not be the only one struggling to get the hardware in the "unlikely" case the POTUS blurts out another fart.

reply

upvote

by alfiedotwtf10 hours ago|

[-]

> At any point the USA could come up with some arbitrary export controls

lol his already happened with Fable!

reply

upvote

by jrm415 hours ago|

[-]

At current "proprietary inference company behavior," investing in local inference sounds like the exceedingly far more rational option.

Long term predictability ought to far outweigh a few more cycles of performance.

reply

upvote

by laserlight5 hours ago|

[-]

Don't forget the fact that you'll be questioned to death when you criticize the current generation of models, but somehow, when the new models arrive you'll be questioned to death if you don't find them better than the old ones.

reply

upvote

by trollbridge7 hours ago|

[-]

There are open models with groundbreaking innovations, like MiMo-2.5-Pro-UltraSpeed which you simply can't get anywhere else (there is no other model with those capabilities that I can get with 1000 token/second speed).

reply

upvote

by realusername15 hours ago|

[-]

There's also a lot of benchmark trickery going on, it's becoming harder to see how the latest models really improved.

The top models also seem to have inconsistent performance depending on the time of day and how far we are from the next release.

reply

upvote

by bonesss14 hours ago|

[-]

I’m an LLM fan, but from an engineering perspective the idea of building atop services that palpably fluctuate in capacity, performance, and capability is nutty.

Even with minor automation I feel like I can watch OpenAI and Anthropic engineers fiddling in real-time. Tuesdays behaviour changes by Thursday, 10AMs production isn’t possible at 11:30AM. Nutty.

reply

upvote

by targafarian14 hours ago|

[-]

I chilled significantly on using Google for anything to do with business due to API (and offering) stability. (Still use Google for personal things.) But AI models seem orders of magnitude more fluid, so to my risk-averse eye, they're nothing I'd base my own business on.

reply

upvote

by senordevnyc7 hours ago|

[-]

Imagine having a business where you're at the mercy of the fluctuations in capacity, performance, and capability that your human employees display!

reply

upvote

by intothemild10 hours ago|

[-]

Since I started running my own inference server, I've had zero degradation that I didn't do myself. Basically the only time I see it get worse is if I drop one of the quants.

Which is what I suspect the providers are doing to fit more inference on the same amount of hardware over time.

reply

upvote

by Barbing14 hours ago|

[-]

Interesting, Claude might be doing better since I last checked:

https://marginlab.ai/trackers/claude-code-historical-perform...

There were at least a couple of these degradation trackers.

reply

upvote

by fsuts11 hours ago|

[-]

Agreed

reply

upvote

by 4fffs16 hours ago|

[-]

Correct. Anything else is pure marketing and you have fallen for it.

reply

upvote

by Aurornis15 hours ago|

[-]

> I think it's interesting that people write off open weight models because they're "a few months behind" proprietary models

I experiment a lot with the open models and I’m getting tired of this trope. I’m not yet convinced that even the best open weight models are equal to Opus from “a few months” ago.

I know what the benchmarks say. I had higher hopes. My real experience just doesn’t match the benchmarks.

I also do a lot of work that even Opus 4.8 struggles with. When even the cutting edge LLMs aren’t all the way there yet, my motivation to switch to something even further behind just isn’t there.

reply

upvote

by iot_devs13 hours ago|

[-]

I would love if you could make some examples

reply

upvote

by CamperBob214 hours ago|

[-]

Have you found anything specific that the full-precision quant of GLM 5.2 can't do that Opus 4.8 can? I haven't, so far.

5.2 lives up to the hype. I don't find it to be the best at anything except coding. But for coding... yeah, it lives up to the hype. Not quite Opus 4.8-level, but I would feel comfortable comparing it to 4.5, at least if it had vision capabilities.

reply

upvote

by OtomotO12 hours ago|

[-]

> My real experience just doesn’t match the benchmarks.

That's exactly the problem I have... with Anthropic and "Open""AI"

reply

upvote

by dwoosley16 hours ago|

[-]

The only reason I'm on HN right now reading this post is because the Anthropic's API is down... so there's another point for self hosted.

reply

upvote

by qznc9 hours ago|

[-]

To be a little bit more precise than "a few months behind", what probably matters is before or after "Claude Opus 4.5 from Nov 24, 2025". That was the model which started the OpenClaw hype over Christmas.

reply

upvote

by itwaswatson14 hours ago|

[-]

We have a provider with Deepseek V4 flash at our work. It can handle 95% of the "actually functional" workload at a tenth of the cost. I still pull up beefier ones sometimes, but that's after some consideration.

The moat is so flat, it only gives +1 food and +1 production. +1 gold with a road.

reply

upvote

by calgoo9 hours ago|

[-]

Same, i feel that V4 Flash is great at task implementation, but im still looking at bigger models for design. Now, GLM 5.2 with high thinking is actually getting really close now. I have switched for all personal projects right now and am quite happy with the results. I think the magic is in the big context window (1m) + a lot of thinking gets us very close to at least Opus 4.6 level. Im currently running directly on z.ai with a lite coding plan, and have bought API credit on deekseek as well. I will be looking at EU based hosts next and then i might switch over some of the more critical flows.

reply

upvote

by taormina17 hours ago|

[-]

For that matter, the new models are shit. If I’m using Opus 4.6 anyway to get anything actually done, then great, we’re actually entirely caught up then.

reply

upvote

by 827a14 hours ago|

[-]

Intelligence is maybe a few months behind. But cost sadly is further behind. GLM-5.2 has a deceptively high cost during day-to-day usage for e.g. coding because 1) it has to think a ton more than GPT-5.5/Opus-4.8 to get to competitive results; 2) many providers are still figuring out caching; and 3) API pricing for Codex/Claude can be as high as 40x more than subscription pricing, which distorts the market.

reply

upvote

by Gigachad16 hours ago|

[-]

The reason for me is work pays for Github Copilot which doesn't have these open modals.

reply

upvote

by derwiki5 hours ago|

[-]

OOC did an LLM write this? The last sentence feels very LLM

reply

upvote

by 10 hours ago|

[-]

deleted

reply

upvote

by 14 hours ago|

[-]

deleted

reply

upvote

by TacticalCoder17 hours ago|

[-]

> I think it's interesting that people write off open weight models because they're "a few months behind" proprietary models.

The really interesting thing is that it's typically those very same accounts who were explaining, a few months ago, that thanks to their commercial model they were gaining so much time and producing so much fantastic code.

A few months passes and suddenly the open-source model have caught up with the models that were gaining them so much time and that produced amazing code (in production everywhere for sure btw) but... It's impossible to work with these models.

Rinse and repeat.

The current models, according to them, are basically AGI and they can go fishing while paid subscriptions solve the world's problems.

But when it six months there shall be new closed, pricey, models and when the open ones shall have reach the level of Fable, we'll hear how it's impossible to work in late 2026 on a model that is "only at the level of Fable".

These people should have been snake-oil salesmen (and it could be what they actually are).

reply

upvote

by nemomarx16 hours ago|

[-]

My most charitable interpretation that there's some honeymoon effect for each release, and people genuinely feel very productive and useful for 2-3 months. By the time the next big model release happens they've seen some issues or run into something that makes them feel like the new model will fix all that and improve their flow so much, etc.

Not unusual in the tech space, but this has been basically constantly happening for two years now? I can't imagine the improvements are more than incremental at this point.

reply

upvote

by windexh8er14 hours ago|

[-]

They are generally referred to as the Kool-Aid drinkers. There's always something holding them back from open models. It's no different than the argument in the article. I've been daily driving Linux for well over 20 years at this point and while things have gotten easier they haven't gotten that much easier. There's always been a distro that's focused on new users or ease of use. I used to take for granted the Linux distro ecosystem but now worry how Microsoft, Apple and others will continue to try and legislate compute into a corner. I can appreciate good engineering, but when I look at OS X and Windows they're both failing end users in different ways.

Just like the OS ecosystem I think we'll see a similar trajectory with OAI, Anthropic and Google but on a much accelerated time scale. I think the lobbying has begun to lock in their fate for revenue - because none of them give a shit about their users. I do hope, however, that Anthropic continues to over rotate and continue to gimp their models into uselessness. I just asked Opus 4.8 the other day to look at some code as an adversary and summarize areas that should be addressed. Nothing specific and it shut down the conversation. However starting a new prompt and prodding the model from a different angle yielded the results I asked for directly. Pick a lane. Or, don't and continue to lose industry respect and consideration.

reply

upvote

by tonfreed16 hours ago|

[-]

Even just one of the smaller models is good enough for the grunt work I use them for 90% of the time. Currently doing most of my home hobby projects with OpenCode Go and Qwen 3.7 Plus, it's not great at diagnosing issues in the code, but if I can clearly articulate a test suite or boilerplate refactoring it works fine.

reply

upvote

by nomel1 minutes ago|

[-]

> I use them for 90% of the time

10% failure rate would drive me absolutely insane.

reply

upvote

by moomoo1114 hours ago|

[-]

ok but your competition using the latest models has an advantage

not all of us are doing noob shit lol

reply

upvote

by handoflixue8 hours ago|

[-]

You're being entirely unreasonable. 640 kilobytes of memory was enough for Bill Gates, and yet somehow your special project needs more?

reply

upvote

by 59nadir10 hours ago|

[-]

Heh, if you're using LLMs heavily for work I think odds are pretty good you're doing pretty trivial stuff. It might not be trivial to you, but you're probably just not very good at this.

reply

upvote

by derwiki5 hours ago|

[-]

Pretty sure the big quant shops heavily use LLM; maybe it’s trivial stuff and they just work 100 hrs/week?

reply

upvote

by 59nadir44 minutes ago|

[-]

Anyone who heavily uses LLMs for their work is pretty obviously ill-equipped for their work, and likely getting even worse at it with time, yes. You can throw around as many "things people from the US worship because they make money" positions/industries as you want. Nevermind that you're "pretty sure". People who both know what LLMs are actually capable of without major issues, plus are already capable enough to do their job don't need to use LLMs heavily, they'll use them for what little they're actually useful for. Only incompetent (or uninterested & incompetent) people lean on them very heavily.

Edit: To clarify what I mean by this:

Anyone who uses LLMs for larger-than-small-module code generation, pretend-not-vibecoding (a.k.a spec-driven development), or outright vibecoding, etc., is using an LLM "heavily", IMO.

The appropriate things to use them for is information retrieval, plus as a basic extra signal in debugging, code understanding, quality checks, and so on.

Also, it's not illegal to be incompetent. Most people were incompetent long before LLMs showed up, it's not some rarity.

reply

upvote

by tumdum_8 hours ago|

[-]

I find the attitude shown in this post very surprising. On the one hand, the post starts with a story of adopting Linux and other FOSS. The core of FOSS is giving its users the ability to understand and modify software they run. On the other hand, the rest of the post is about using a tool (LLM) that the author has no way to modify and no way to understand. Huge matrices of floats are at best comparable to compiled code. But the reality is even worse - it’s actually easier to decompile and understand proprietary software. Not to mention the fact the most of the time users can’t even run the “open” models since it requires hardware that most can’t afford.

How did we get from prising software freedoms to this?

reply

upvote

by 5424585 hours ago|

[-]

I’d disagree wrt “modify”. There are all sorts of tools for modifying LLM weights (ie to remove refusals, remove layers or experts, merge models, finetune, and more) and a quick glance at huggingface or civit will show those in very active use.

I don’t think the hardware requirements are relevant. If a research lab publishes the code their particle collider runs under the GPL, that doesn’t make it not OSS even though they’re the only ones on the planet with the hardware to run it.

reply

upvote

by tomjakubowski1 hours ago|

[-]

You can also edit binary distributions of models with means besides changing their weights. See "LLM Neuroanatomy: How I Topped the LLM Leaderboard Without Changing a Single Weight."

On the spectrum of:

  careful engineering--hacking--mad science

This kind of thing falls far towards the mad science end of the scale, but has proven effective.

https://dnhkng.github.io/posts/rys/

reply

upvote

by DrScientist8 hours ago|

[-]

What's amazing about these models is they are effectively a distillation of the internet in something that can fit onto your local machine [1] and be queried via natural language.

[1] It seems inevitable that decent local models will be possible as the technology and the hardware is improving at a rate beyond the growth of the knowledge base to be distilled.

reply

upvote

by GL264 hours ago|

[-]

What makes an open model worse is ultimately the budget : you have access to worse data, not SOTA models, less GPU compute time, and having a good fine tuning team is extremely expensive. Linux works because the entry barriers are purely on a software side : a lot of contributers all around the world can outclass any OS by contributing on their scale to Linux. All you need to contribute is a computer, and your brain. Open models don't have the same community push, they rely on core ressources that not anyone owns. And injecting them in the model costs too much money. If there are no public breakthroughs in the way we train large open models that makes community led models 10x better, the shift to open models will never happen on a large scale.

reply

upvote

by anuramat3 hours ago|

[-]

there is zero downside to not switching though: just use claude while it's good and subsidized, switch if rugpulled

reply

upvote

by layer81 hours ago|

[-]

The downside, apart from privacy concerns, is sending your money to parties you don’t want to support.

reply

upvote

by anuramat1 hours ago|

[-]

if you care about privacy, claude/gpt is strictly not an option and there's nothing to think about

> sending your money

akchyually if you do it right, you are sending negative money; fair enough otherwise

reply

upvote

by Aurornis15 hours ago|

[-]

The headline says one thing, then the article text says this:

> I’m hoping it’s going to be minimal.

I have multiple subscriptions and I pay per token to try out different LLM providers through OpenRouter. I also run open weight models locally.

I just can’t agree yet. The models from Anthropic and OpenAI really are that much better than anything else. The open weight models must be universally benchmaxxed across the board because my real world experience with them is very different than what the benchmarks imply. I get downvoted a lot for speaking about my experience because I don’t think it’s the reality that people want to hear right now, but it’s true for complex work.

I do think there are a lot of easier tasks that can be handled appropriately by the open weight models in the hands of a skilled operator. If an entire job is simple enough that you wouldn’t hesitate to hand it off to a junior with a little supervision then any model will do. However for a lot of the work I do, even Opus 4.8 on Max requires a lot of attention and extra steering and review to keep it on track. Fable did, too, though to a lesser degree. When I try to use the big open weight models (hosted, because they’re not running at reasonable speeds locally at a quantization I can tolerate) it feels like I spend more time waiting while they burn tokens for output that I probably have to reject anyway, at least for the bigger tasks. I wish they were there, but that’s not the case yet.

reply

upvote

by justusthane4 hours ago|

[-]

The article also contradicts itself halfway through:

> There remains a clear penalty for being an open LLM user.

The conversation here _around_ the article is interesting, but the article itself boils down to “I’m going to try using open models and hope for the best.”

reply

upvote

by iot_devs13 hours ago|

[-]

Do you have any example?

reply

upvote

by 13 hours ago|

[-]

deleted

reply

upvote

by whatever113 hours ago|

[-]

Claude started becoming useful for my coding purposes after it hit version 4.6. After that sure some nice to have additions but I think if I had 4.6 sonnet & opus as open weights, I would not need something more.

Having played a bit with Fable, reinforced the above.

reply

upvote

by JeremyNT4 hours ago|

[-]

Yeah for me the coding inflection point was relatively recently (GPT 5.3 perhaps). There's just a threshold they have to hit to be consistent enough to avoid having to redo work and only the later models started delivering it.

This certainly seems feasible for open weight models eventually, but I'm still extremely skeptical of the claims about reaching this level with any open weight model that can be run locally (nevermind the hardware costs to do so practically).

reply

upvote

by ch4s35 hours ago|

[-]

I agree and I'd love for local models to hat the sonnet 4.6 level but nothing seems really all that close, and I'm not particularly excited about giving money to deepseek.

reply

upvote

by PeterStuer11 hours ago|

[-]

While I agree with some of the gist of the article, 2 remarks:

1. Unfortunatly in my tests the open models do not (yet?) rival, at least Claude Opus, for software development/engineering and adjacent tasks.

2. Enjoy while it lasts. I'll be genuinly amazed these open models will not be declared 'illegal' under some security pretense by the end of the year. And I say 'pretense' because the primary driver will be regulatory capture and industry protectionism.

reply

upvote

by mirekrusin11 hours ago|

[-]

Banning models in US just strengthens competing states, ie. China.

reply

upvote

by 9 hours ago|

[-]

deleted

reply

upvote

by pkulak15 hours ago|

[-]

Sure. But OpenAI is the same price. Why would I pay $18/month for z.ai when OpenAI is $20/month?

reply

upvote

by CJefferson15 hours ago|

[-]

One big advantage I’ve found — people get attached to models (including me). With open models if you find one that works perfectly for you but the next version doesn’t, you can run the old one forever (or someone will for you)

reply

upvote

by itake14 hours ago|

[-]

But… the models will fall behind. As libraries and languages and tool calling updates or the world knowledge changes, the models decay.

Personally, I don’t like the change, but it’s just how technology works so I’d rather move with the flow than try to stick my foot down and freeze time.

reply

upvote

by hypfer8 hours ago|

[-]

> But… the models will fall behind.

Yes but why does that matter? If I am happy with its capabilities now, I will continue being happy with its capabilities in the future.

Yes, it cannot do the newest magic shit, but why does that matter? It can still do everything that existed up until that point, which is _a lot_.

Eventually, you might also need something new, but it's not like the world shifts over all problems that exist from <old> to <new> and any tech for <old> problems suddenly becomes obsolete?

reply

upvote

by itake8 hours ago|

[-]

ideally, the software produced should include the latest security patches.

If the model prefers a version of Ruby or node with an RCE, I guess you can burn tokens to teach the model how to avoid the introducing the vulnerability into your code?

That feels quite tedious and token inefficient..

reply

upvote

by hypfer8 hours ago|

[-]

I'm sorry, but.. are you being serious?

Yes. Yes. The only way one can write secure software is by always using the latest SOTA model. Anything else is inefficient and vulnerable.

I hate this platform

reply

upvote

by itake7 hours ago|

[-]

https://news.ycombinator.com/item?id=46809708

Maybe you missed this article, but vercel found it quite annoying to teach AI about the latest updates in the React Framework.

I think you’re confusing my point. I’m not saying that only SOTA models can write secure software, I’m saying that the models produced today will write software that’s considered insecure by 2034 standards, thus you would require to burn more tokens in AGENTS.md or burn more of your time to hand write code.

For example, you’re more than welcome to run Windows ME if it does everything you need it to, but that doesn’t mean Windows ME is a secure environment.

reply

upvote

by 0xbadcafebee3 hours ago|

[-]

Another solution might also be to stop reinventing the wheel every few years. New languages aren't producing better software. But people keep churning new languages out, and they become popular because humans have emotional attachment to inanimate things. If humans weren't so emotionally involved with the code, AI could happily produce C/C++ software indefinitely. (And if we could kick our dependence on the fucking browser for an application platform, we wouldn't need the horror that is the JavaScript ecosystem)

reply

upvote

by OtomotO12 hours ago|

[-]

No problem, "AI" will just write its own frameworks and libs then!

reply

upvote

by taytus15 hours ago|

[-]

This is a good point I never thought of. I appreciate it.

reply

upvote

by bob10295 hours ago|

[-]

Why pay a monthly fee when you can pay for exactly the # of tokens you actually consume?

The API rates are very affordable once you start to optimize for the fact that prepaid tokens seem to massively outperform other kinds of tokens.

I can often do with 1 million tokens what my peers have failed to do with 100 million. For me to spend more than $200/m in prepaid API tokens I'd have to pull a 007 work schedule.

reply

upvote

by baby_souffle4 hours ago|

[-]

> Why pay a monthly fee when you can pay for exactly the # of tokens you actually consume?

Because my 500m tokens so far this month would cost me about $500. My subscription is 100$/month.

reply

upvote

by slopinthebag27 minutes ago|

[-]

That’s insane. 500m tokens costs me $12 on Deepseek.

reply

upvote

by 0xbadcafebee12 hours ago|

[-]

One reason might be request limits. OpenAI's ChatGPT Plus w/Codex ($20/month) provides a worst-case 5-hour-request-limit of 15 for GPT-5.5, 20 for GPT-5.4, 60 for GPT-5.4-Mini. Whereas Z.ai Lite ($18/month) provides a worst-case of ~80 for GLM 5.2 (off-peak; on-peak is 2am-6am New York time). So Z.ai can provide higher limits for a cheaper price. (https://codeberg.org/mutablecc/calculate-ai-cost/src/branch/...)

reply

upvote

by pbgcp20269 hours ago|

[-]

Subscriptions are done. By the end of 2026 everyone will be paying for actual mils of tokens consumed, via API calls.

reply

upvote

by 0xbadcafebee4 hours ago|

[-]

I don't see any indicator of that happening. And actually token count pricing is frequently being replaced with "credits pricing", and subscriptions with obscure variable limits

reply

upvote

by fulafel14 hours ago|

[-]

https://news.ycombinator.com/item?id=48618455

reply

upvote

by pkulak14 hours ago|

[-]

I pay month to month.

reply

upvote

by notatoad13 hours ago|

[-]

the pricing page doesn't seem to call it out anymore, but the claim on z.ai coding plan used to be 3x the usage of the equivalent-price claude plan. whether that's accurate i don't know, but just based on api pricing GLM is way cheaper.

reply

upvote

by flexagoon9 hours ago|

[-]

OpenCode Go is $10/month and the limits are much more generous than those or Codex

reply

upvote

by aitchnyu5 hours ago|

[-]

After all the articles calculating OpenAI and Anthropic giving heavily subsidizing their subscriptions, how does OpenCode Go manage to be even cheaper?

reply

upvote

by 0xbadcafebee4 hours ago|

[-]

OpenAI and Anthropic are trying to pay off a half trillion dollars of investment. They also have the most demand right now, to the point that Anthropic sometimes doesn't have enough compute and that means more limits. They can't stop taking new customers, though, because the market would hate it.

An open weight inference provider only needs to pay for GPUs, or discounted APIs from 3rd party vendors. Same basic financial model but they didn't spend a trillion dollars so their loss isn't as high so they can afford to do more inference for less money, and their demand isn't as high so there's more than enough compute.

reply

upvote

by flexagoon5 hours ago|

[-]

Because it offers cheap open source models, not GPT and Claude. I mentioned it as an alternative to Z.ai's subscription in OP's comment, not to Codex.

reply

upvote

by bnj15 hours ago|

[-]

I’ve been wanting to get better acquainted with local inference but I don’t have the hardware, which has made me think about something I haven’t seen discussed, which is local collaboratives. The economics makes it seem like a group of people joining together to run good hardware and an open model might make sense, but I haven’t seen anything like this mentioned. Have I been missing it?

I think it would be pretty neat to launch a service helping people who wanted to participate in something like that locate one another.

reply

upvote

by Aurornis14 hours ago|

[-]

The reason you don't see more of this is because everyone does the math, realizes it's not a good deal, and then gives up on the idea.

There's a post at the top of /r/localllama about this exact math right now: https://www.reddit.com/r/LocalLLaMA/comments/1ubrcwj/tokenom...

TL;DR: Running GLM 5.2 is going to cost about $20K minimum, and that's going to be painfully slow compared to the cloud hosted versions. Even the estimates where the server is computing tokens 24/7 you can't break even for several years.

The only reason to run locally is if complete data privacy is your top concern. You pay a high premium for that.

reply

upvote

by wongarsu3 hours ago|

[-]

If you invest the minimum to run the model, obviously that's more expensive per-token than investing the optimum to get the best price/performance tradeoff (which for GLM 5.2 is at least five times that figure)

If you can bring the load to run the model on close to optimal hardware 24/7 with multiple concurrent requests, and have reasonably cheap power and AC, you would break even in a reasonable timespan. Which won't happen unless you are self-hosting for a medium-sized company. I guess you could sell your spare capacity to get better utilization ... and we've reinvented hosted inference

reply

upvote

by FridgeSeal10 hours ago|

[-]

I mean sure, I’d you’re attempting to run the biggest possible models, it’s going to require a stupid amount of compute? I thought we all knew this?

The appeal to me is that we can run that, but we can also run smaller models on your laptop _and it’s functional!_ I can run DeepSeek v4 flash and a qwen 3.6 on my laptop! Thats crazy good.

reply

upvote

by pjc506 hours ago|

[-]

.. conversely, all the cloud LLMs are being subsidized by their investors in addition to massive economies of scale.

reply

upvote

by Aurornis5 hours ago|

[-]

It is false to say that all cloud LLMs are subsidized. The open weights models are hosted through numerous third party providers on OpenRouter that are operating as hosting businesses. They aren’t spending investor money to provide tokens for you at below-cost rates. They’re operating as hosting businesses.

reply

upvote

by wongarsu3 hours ago|

[-]

economies of scale are enough to explain the entire price difference. Running 8 concurrent requests at 100 token/s on $100k hardware is a lot cheaper than running one concurrent request at 20 token/s on $20k hardware

reply

upvote

by uberex10 hours ago|

[-]

https://news.ycombinator.com/item?id=48524387

reply

upvote

by markerz14 hours ago|

[-]

There are plenty of providers of open models that offer very affordable rates. Generally, I recommend looking at OpenRouter since they track various metrics for the various providers.

reply

upvote

by blackoil15 hours ago|

[-]

Open models hosted in Cloud???

reply

upvote

by pbgcp20269 hours ago|

[-]

AWS Bedrock hosts Gemma 4 31B and this is The Best Deal – hands down. Try it. Vertex also has Gemma 4 MoE version. Not "lobotomised" by quants. There are also GLM (latest) and Qwen / DS (but these two are not latest versions)

reply

upvote

by reacharavindh9 hours ago|

[-]

It was easy to be a rebel and use Linux when it was clearly competent, but needed hacks and extra elbow grease to get it polished for use. IME, the open models are “not there yet” in terms of capability or operational needs. Sure, GLM5.2 looks competent, but I will only be able to get it to run that competent if I had a huge cluster of GPUs.. if I am accessing an open model via hosted API, I might as well run a closed model via hosted API. The incentives fall apart in comparison to using Linux 15 years ago.

Don’t get me wrong. I wish I could run a local model and be happy about it. At the moment, I’m not.

reply

upvote

by hypfer9 hours ago|

[-]

> if I am accessing an open model via hosted API, I might as well run a closed model via hosted API.

uh.. no?

The whole thing is that it cannot be enshittified, because there's not just a single party having control over it.

As it has happened, is happening and will happen.

With open weights, you cannot easily be rugpulled or locked out or any of that stuff. If the corp attempts that, someone else with an server farm will gladly take you as a customer with absolutely 0 changes to your workflow other than swapping out the API URL + Key.

You'll be talking to the same model with the same personality and same knowledge.

reply

upvote

by mdale17 hours ago|

[-]

I think the frontier will command premium for sometime just as slight better software developers were 10x's vs their peers as their architecture & development strategies and code approach compounded quickly. One less error per block of work compounds quickly.

Sure, there may be some cases and reasons for local models and industry is so large they will continue to make progress and gather economic value and users for specific use case; but frontier will command vast majority of the economic value distinct from Linux and open source where the model created better than proriatary economic incentives around development

reply

upvote

by byzantinegene16 hours ago|

[-]

10x developers were not slightly better than their peers, they were vastly superior and faster. OTOH, the lead of frontier llms is diminishing as training is getting diminishing returns.

Also, on that note. Not every company needs 10x developers, just as not every task needs frontier llms. Ultimately, operating costs will be the largest contributing factor.

reply

upvote

by 4fffs16 hours ago|

[-]

Youre clutching at straws.

Ultimately its a financial game. Open source is far cheaper so it already has an upper-hand. Frontier models have to justify financially why they are worth the additional spend.

reply

upvote

by radhitya17 hours ago|

[-]

Have you read about Opencode Go? They are great provider for open model, like GLM 5.2, Deepseek v4 Pro, Kimi 2.7 Code. You should give it shot to them :-)

reply

upvote

by 2muchtime12 hours ago|

[-]

The amount the HN community, at least from what I’ve seen, is sleeping on OpenCode Go (and zen) is kind of amazing.

$10 a month gets you generous usage with the best open weight models and they claim to have zero retention and not to train on your usage.

It’s unclear to me what the advantages of openrouter are but it seems to be a default I see many people talking about here.

reply

upvote

by johndough11 hours ago|

[-]

> It’s unclear to me what the advantages of openrouter are but it seems to be a default I see many people talking about here.

The advantage of OpenRouter compared to using API providers directly is that you can switch between API providers without binding your money to a single provider.

The advantage of OpenRouter compared to OpenCode Go is that the price for DeepSeek-V4-Pro and MiMo-V2.5-Pro is better on OpenRouter.

For example, DeepSeek costs $0.435/0.87/0.003625 for 1M in/out/cached tokens (https://openrouter.ai/deepseek/deepseek-v4-pro), compared to an equivalent of $1.74/3.48/0.0145 under the OpenCode Go plan (https://opencode.ai/docs/go/#usage-limits), almost exactly 4x.

But since you get a monthly usage limit of $60 with the OpenCode Go plan for $10 (i.e. 6x), you might still come out ahead if you use it a lot (or use other models, where the pricing difference is smaller or non-existent).

reply

upvote

by 2muchtime28 minutes ago|

[-]

So the cost makes sense I was unaware but

“The advantage of OpenRouter compared to using API providers directly is that you can switch between API providers without binding your money to a single provider.”

Opencode Go gives you a choice between “the best” open weight models and you’re not tied down to just GLM or MiniMax and Zen gives you an even longer list of providers including Claude and GPT?

Is it that Openrouter gives you access to like… every model and provider?

reply

upvote

by _pdp_9 hours ago|

[-]

There are downsides depending on how good is your harness. Switching the model is easy enough. Ensuring that the harness continues working the way it did is a completely different thing. This is not just about the prompts but also general behaviour around the model and its infrastructure.

So while it is not complicated and certainly something that can be solved, it is not plug and play.

That being said, we switch to open weight models earlier this month and the results has been more than positive so far. The cost savings are also hard to dismiss.

reply

upvote

by c-b4 hours ago|

[-]

What's confusing to me is that there is no discussion about the actual downside experienced it's just theoretical.

reply

upvote

by shever733 hours ago|

[-]

There seemed to be no real discussion about anything! I was expecting more of a conclusion, but the article did not support the proposition in the headline.

reply

upvote

by arttaboi11 hours ago|

[-]

I guess this will happen soon. There are two catalysts needed for this to happen:

1. Evals that can quickly tell you how much downside there is to switching 2. Something like OpenRouter that can help you run those evals quickly

Now #2 is starting to become popular, and I think we'll soon see more people adopting a model-agnostic approach. Of course, there will still be high-intelligence use cases where nothing comes close to Claude or GPT.

reply

upvote

by alexhans10 hours ago|

[-]

Exactly. I'm very happy the discourse has moved on from "but X model is the best" to "you can use open models".

Whether you're using SDK or harness based agents, having evals means you're able to modify any part of your agent and still know what satisfies your "good enough".

It's great for designing products that are easy to change as well.

reply

upvote

by ZeroGravitas9 hours ago|

[-]

It seems the best self-hosted and the worst models served by big providers has some considerable overlap in quality.

Whatever reason people have to run those (cheaper? backwards compatibility once you get something running) surely applies to the open models too, maybe even more so.

reply

upvote

by linzhangrun16 hours ago|

[-]

Open source models are still not good enough for now, but with the current speed of one new SOTA every two months, by this time next year we will definitely have cheap open source models at least as good as Fable :)

reply

upvote

by sho12 hours ago|

[-]

I don't think we will. The open model labs are too resource constrained to approach Fable or even Opus on the general case and I don't see that changing within a year.

Right now, due to profound shortfalls in both data and hardware compared to the US labs, the OSS models are IMO basically technology demonstrators that in practise are even more jagged than the US labs' efforts. The high points of the jaggedness are close - but number of happy paths is many times fewer, and their behaviour inside the harness is far less refined. Barring some incredible breakthrough I don't think that is changing without a much higher level of resources - which seems impossible given the current hardware environment.

I have no reason to think that Anthropic or OpenAI are in possession of some secret sauce that the Chinese labs can't duplicate given the right resources, but the fact remains that absent those resources they'll remain behind. Barring some incredible bombshell reveal from Huawei I don't think this asymmetry resolves in a year. In three years it may well be a different story.

reply

upvote

by linzhangrun11 hours ago|

[-]

deepseek-v4-pro, probably the representative cheap opensouce LLM, was released in 2026.4 One year before, what OAI had in hand was gpt-4.1 and gpt-o3. I think it is not very controversial to say that deepseek is stronger than them, at most you can point to some post-training problems, basically the instability you mentioned. Also I am not sure if it is because the people who are best at using AI -- the people making AI -- get more development speed as the models get smarter, but my feeling is model progress is getting faster and faster. GPT-3.5 and GPT-4 were almost one year apart. The disadvantage from hardware limits and compute shortage is visible from the size of chinese models. glm-5.2, which is claimed to be around opus-4.6 level in coding, is only 744B. But Chinese engineers are obviously, how to put it, getting very effective results on "performance at the same size". And that is not even talking about the advantages from China's electricity, manpower, or even "national will" to compete against America. So saying it may take three years to catch up with a gap that is now only several months looks too pessimistic. ChatGPT itself was released only three and a half years ago, and today is already a completely different world.

reply

upvote

by sho10 hours ago|

[-]

You may be right, and I certainly hope so!

But the question was about whether the Chinese labs will have fable-equivalence in 1 year. I am by no means some kind of insider, but knowing the vaguest outlines of what went into Mythos, they just can't do it. The compute is not there. The Chinese engineers are incredible, but they're not literal magicians.

Of course there could be something incredible to come out of left field and overturn the apple cart yet again, but that's speculation. It would be awesome, sure! But I wouldn't bet too heavily on it.

And FWIW - again, no disrespect at all to the Chinese engineers but I don't rate GLM5.2 as being even close to opus 4.6. It can hit a few benchmarks, sure, that's the top edge of the "jag". But filling in the rest of the capabilities - again, it takes compute and data the OSS labs just don't have, that anyone knows about at least.

reply

upvote

by myzek11 hours ago|

[-]

Any tips on which model to use and how to use them? I have 64 RAM and 16 VRAM (I know it's not a lot, it's a gaming GPU) and I'm trying to find a good model to use but it's a bit of a struggle

reply

upvote

by peter_retief12 hours ago|

[-]

What open models are "recommended"?

I like the Linux analogy, I struggled with Linux way back.

reply

upvote

by Animats11 hours ago|

[-]

OK, now what? Someone offers open models as a service? That's basically a time-sharing computing business - people at terminals sharing remote computing resources. If you buy your own H100 it will be idle while you're typing or reading or thinking. So sharing makes sense.

But it doesn't have to be an "AI company". It's just a compute service. The companies that offer web hosting could get into this.

reply

upvote

by flexagoon9 hours ago|

[-]

> The companies that offer web hosting could get into this.

They already do. DigitalOcean is one of the providers on OpenRouter, for example

reply

upvote

by HarHarVeryFunny3 hours ago|

[-]

There are lots of companies providing open models as a service. DeepInfra and Fireworks AI for example. Even Amazon for that matter.

reply

upvote

by petesergeant8 hours ago|

[-]

Headline: "The is minimal downside"

Article: "I’m hoping it’s going to be minimal"

reply

upvote

by PcChip18 hours ago|

[-]

Is it just me or is half the article missing?

I enjoyed the first part though

reply

upvote

by epolanski1 hours ago|

[-]

I unsubscribed from Anthropic and our (EU-based) team is moving to an "ai-server" running opencode + GLM 5.2 and DS4.

There are several benefits:

- we cut AI spending by thousands

- there is one AI server and starting different sessions for each user, one memory/skills/etc and everybody is involved into reviewing what went wrong and why. Harness finally makes sense and pays off more.

- we can trust that the models are those that we run and not black boxes

- no more money flowing to US narcissistic entrepeneurs and no more business being tied to US legislation

Not gonna lie, GPT 5.5 Pro and Fable 5 were a tiny bit ahead, especially on longer vibecode-style tasks, but it's just not worth it.

reply

upvote

by DANmode18 hours ago|

[-]

But, what model are you using?

and what hardware are you using?

reply

upvote

by 0gs18 hours ago|

[-]

yeah, on a 96GB Mac Studio and Gemma+Qwen, it's definitely fully doable. fully doable but not really for coding on 16GB. but svelter models and cheaper (eventually) hardware are coming!

reply

upvote

by nezuzen18 hours ago|

[-]

"cheaper (eventually) hardware" Best case 2-3 years from now. Otherwise it will take a major global recession to get us anywhere near last year's prices.

reply

upvote

by marcus_holmes15 hours ago|

[-]

Macs are expensive hardware, but I'm always seeing people running LLMs on them. Is anyone running on cheaper generic hardware and Linux?

reply

upvote

by numpad02 hours ago|

[-]

Qwen3.6-35B-A3B-Q4_K_M.gguf spread across few 8-16GB GPUs is cheap as reward points for a comparable Mac if you don't mind heat, noise, and not-blazing-fast generation speeds.

Most ATX cases only has 7 PCIe I/O shields and can't take more than 3x double slot cards, but many gaming systems can take 2x double slot full length 16GB cards, and they should be fine for many purposes. Cooling is most easily done by a squirrel cage fan mounted with a 3D printed bracket at the back.

Cheap parallel action crimping tools for Molex 5556 works too - PCIe 8-pin is NOT 5557, it's differently keyed, so the specifically PCIe intended housings have to be used for cables, if you are crimping your own.

No one is mining crypto anymore, and crypto PSUs are being dumped dirt cheap, should you want a stable bulk 12V supply.

reply

upvote

by brucehoult15 hours ago|

[-]

A Mac is cheaper than a high end GPU with the same amount of RAM.

reply

upvote

by marcus_holmes14 hours ago|

[-]

ah, right, so it's about Apple Silicon being fast enough to use instead of a GPU?

reply

upvote

by brucehoult13 hours ago|

[-]

They use the GPU but an Apple Silicon GPU has the same high speed access to all the RAM on the machine as the CPU does, rather than having its own walled-off maybe 16 GB VRAM in mainstream gaming GPUs or 24 GB in RTX 4090 or RTX 5090 (MSRP $1999 but in practice $3000-$4000 at the moment). Nvidia A100 (80GB VRAM) apparently cost $15,000 or so.

Not only does Apple's unified memory give the GPU more RAM to use, but it also eliminates copying things between CPU RAM and GPU RAM.

A Mac Mini with 48 GB RAM costs $1799. A Mac Studio with 96 GB RAM is $3999 — until March you could get a Mac Studio with 512 GB RAM for $3999, all of which could be used for your AI model.

https://www.tomshardware.com/tech-industry/apple-pulls-512-m...

Some are coming up used at silly prices.

https://www.trademe.co.nz/a/marketplace/computers/desktops/a...

NB NZ$44,999 is "only" US$25,772.

reply

upvote

by 0gs4 hours ago|

[-]

i believe you meant something like US$9999 for the 512GB. otherwise, i'm going to feel like QUITE the fool for choosing the 96GB variant at the same price

reply

upvote

by fsuts11 hours ago|

[-]

And use less power

reply

upvote

by Gigachad16 hours ago|

[-]

I suspect hosted and local will converge when hardware prices come down and API prices go up. The massive rate of datacenter build out will be unsustainable. Right now the hosted models are massively cheaper than buying the hardware and running it yourself which signals that hosted is very subsidized.

reply

upvote

by fluidcruft16 hours ago|

[-]

If you don't have that hardware thr math of buying a depreciating computer is challenging if you are satisfied with the $100/month plans ($1200/year). A 96GB Mac Studio is ~$4k. I think if you have the hardware already as a sunk cost then yes it makes sense. But I'm not sure it is worth spending $4k for today's hardware vs waiting for newer hardware in a few years.

reply

upvote

by OtomotO12 hours ago|

[-]

I am absolutely pro local and true open source models.

Personally I haven't seen any productivity gain since Opus 4.5 times.

But: I can't fully get behind the opinion that (so called) "open source models" are simply superior and will be in the future, because when I asked some models who they are, they answered with "I am Claude from Anthropic", which could mean they have been trained by exfiltrating Claude.

I have NO moral objection to this, as Anthropic and "Open""AI".also trained their models on anything they could get their hands on.

It's more about the question: can and will these models be updated, even if Anthropic et al fail. Who's gonna pay for training then? What's their incentive? Have we reached a plateau?

reply

upvote

by fuck_google9 hours ago|

[-]

[dead]

reply

upvote

by cpill15 hours ago|

[-]

I think once the hardware process comes down and these mini DGXs become cheaper, and by then open models still be smaller and better, there is going to be less and less reason to use the providers. CEOs are already complaining that they are costing too much. There are also large organisations like Banks which can't use external services and are already looking at internal housing. it's a good thing so the big AI companies just went IPO as once the self hosting trend kicks in they are going bust.

reply

upvote

by aussieguy123416 hours ago|

[-]

>There was a time not too long ago when using Linux entailed some professional risk1. First there was compatibility: you may not have been able to render a Word document or PowerPoint correctly, and you might have had to trust Open Office’s export capability to render docs the way you wanted

For a while during this era, I used to port my laptops windows installation into a virtual machine that can run on Linux. It took a bit of hacking away but I could usually do it in a day or two. Then its all Linux with the windows vm being used for the microsoft stuff.

reply

upvote

by blindriver15 hours ago|

[-]

As someone that has pretty powerful desktop that I've been using with local open weight models, people are far exaggerating the quality of them. Some of them are now useful. They don't compare yet to the online models of ChatGPT, Claude, Gemini, etc. They are still about 18 months behind. I have accomplished useful work with them, like image classification on Gemma4, but they are much much slower, much much more expensive and they don't scale at all.

A $10,000 RTX 6000 Blackwell card will pay for 500 months of Claude or Codex, which is 40 years worth of compute. Obviously they are going to raise their prices, my prediction being to $200-500/month, but that still makes them at least years of compute and they scale very well with more traffic. Single GPUs do not, they are pegged at 100% and good luck getting it to answer multiple queries at the same time.

reply

upvote

by causality016 hours ago|

[-]

I know open models have gotten quite good in many tasks such as coding or composition, but are there any that can access the internet and retrieve data like ChatGPT, Claude, etc can?

I do have to admit I have recently begun wishing I could pay five dollars a month for a "just answer the fucking question" plan that would give me results without the guardrails and without the constant simpering and ego-stroking. I keep finding myself going a quick evaluation of "is it faster for me to skim search results myself or to construct an elaborate narrative to make an AI give me a real answer".

reply

upvote

by sleepyeldrazi14 hours ago|

[-]

That's why I like qwen3.6 27B, it has 0 ego, it knows that it doesn't have complete world knowledge, so when it sees a web_search tool it searches all the time. Even qwen3.5 9B is mostly search-eager (but given the size, it's weaker on reasoning on the results if that's needed). I use a stock pi harness with only web_search and web_fetch (cleans up the html to only keep text) tools defined.

I have given up on making Opus actually retrieve online information for me. At this point I only query it side by side with qwen to laugh at how it didn't even attempt to search properly, and how a small local model is beating it every time. Gemini is very fast for searching, but somehow miss-sources all the time.

reply

upvote

by wilj15 hours ago|

[-]

> I know open models have gotten quite good in many tasks such as coding or composition, but are there any that can access the internet and retrieve data like ChatGPT, Claude, etc can?

The things you describe are just tool calling, they're a feature of whatever harness you use. Use OpenCode, pi.dev, or maki.sh with any of the open models.

> I do have to admit I have recently begun wishing I could pay five dollars a month for a "just answer the fucking question" plan that would give me results without the guardrails and without the constant simpering and ego-stroking. I keep finding myself going a quick evaluation of "is it faster for me to skim search results myself or to construct an elaborate narrative to make an AI give me a real answer".

You can do most of this with some system prompts added to whatever agent you're using. You can do it from the settings on the claude/chatgpt websites too. (minus the no-guardrails thing)

reply

upvote

by newwttbreak14 hours ago|

[-]

What are good resources and forums where I can figure out these system prompts to bypass guardrails, atleast on agents?

reply

upvote

by flexagoon8 hours ago|

[-]

There are tons of existing Skills/MCPs for Google/Kagi/whatever search, and making your own is trivial. I gave DeepSeek in Pi a link to Kagi API docs and asked it to add a web search skill, and it did that easily.

reply

upvote

by JSR_FDED15 hours ago|

[-]

Just go to kimi.com and try for yourself (not affiliated, but happy user).

First time I did this I realized in 5 seconds that the big players weren’t going to be carving up the market between them.

reply

upvote

by linzhangrun16 hours ago|

[-]

You can let the AI solve it itself, and then it will provide two solutions: implement a local search service (easily blocked), or purchase a Web Search API service

reply

upvote

by tr_user14 hours ago|

[-]

isn't that just in the harness?

reply

upvote

by impartshadow3 hours ago|

[-]

[flagged]

reply

upvote

by cws_ai_buddy15 hours ago|

[-]

[flagged]

reply

upvote

by fabijanbajo9 hours ago|

[-]

[dead]

reply

upvote

by Atom_Foundry8 hours ago|

[-]

[flagged]

reply

upvote

by c_chenfeng16 hours ago|

[-]

[dead]

reply

upvote

by codelong88816 hours ago|

[-]

[dead]

reply

upvote

by root_axis14 hours ago|

[-]

Imagine taking 6 months longer to release your cookie cutter CRUD app.

reply